I recently did a presentation with the same title as this post and figured it would be a good idea to also type out some of the points made there, as I think it’s valuable. I’ve split it up into three sections (don’t worry, it won’t be three parts) that I have categorized the different types of mistakes in.
Broadly speaking, I’ve grouped the presentation into:
Now, my post is on Microsoft Sentinel specifically, but a lot of the generic advice here will transfer over to any deployment of security tools.
This is the part where we talk about why - why should you get a SIEM? Does it makes sense for you and your organization? I think a lot of the inherent mistakes we make come from not understanding how much work goes into a SIEM. It’s not just the tool itself, but it’s everything around it:
SIEM/Secure operations iceberg.
Talking to non-security people, some have mentioned that Sentinel is “AI-powered” and “does a lot of things for you”. Weeeeeeeeeeeell, about that. It’s a step up in user-friendliness and usability from the likes of Splunk, for sure, but it doesn’t give you too much for free. Anyone who sells you on their giant library of pre-built automation have never ever deployed a SOAR-playbook in their life, trust me.
All roads lead to roam and all mistakes lead back to deciding to invest in a SIEM. That might be a tad dramatic, but it’s true. Consider factors such as the ability to effectively use a SIEM (which is something akin to IaaS in terms of management) vs an XDR which is mostly SaaS. For most companies of small and medium size, the choice between either or always favors XDR.
I like to use this image to illustrate what I mean - while Defender XDR falls into partially identify, mostly protect + detect and some respond, Microsoft Sentinel falls firmly into detect + respond. Sentinel gives you more options to ingest data to do more detection, but with the move from Ibiza UX (Azure Portal) to the Defender portal this becomes and integrated part of Defender. It means you can onboard Microsoft Sentinel as a log ingestion engine (as Defender’s correlation engine will be the working once you have connected a workspace) and SOAR-platform, while using Defender XDR as your primary tool.
All in all, start with EDR/XDR and expand with a SIEM.
Small rant here - there’s a lot of content telling you that Microsoft Sentinel (or other tools) are awesome, powered by AI and will basically just make you secure. Not true. It’s important to know that social media caters to quick wins - a generic good idea might not be good for you, consider your own context.
Before making a decision, consider doing some threat modelling! I have written about getting into this before here, but a quick primer goes something like this:
A good example scenario can be “we want to detect people logging in from outside of Norway, that should be flagged”! Well, why don’t we just use conditional access to block access from outside Norway, then create a system for exemptions for travel (maybe access package self service with manager approval) and instead monitor for more specific use cases? Or we could just write detections for this, even if they would probably be quite noisy and have little actionable output.
So we’ve decided to implement a SIEM (if not this blog would end right here). It’s hard to address specifically for Microsoft Sentinel because some companies have large, world-wide operations in different geo’s, while some are located in a single region. So, this is going to be from the context of a single region. Usually there are two modes for deployment Microsoft Sentinel:
Recently, option 2 was discussed for the new Azure landing zones and seems to be what we are heading towards. It makes access control a lot easier, generally reduces cost (paying double for any K8s logs costs the same as a small country) and with the small drawback of having to dual-home some logs. Also important to note that with the introduction of the unified platform/experience, Microsoft Sentinel will primarily live in the Defender XDR UI from summer 2026.
This means we have to consider the fact that our Defender XDR correlation engine will only work for the first workspace we connect - any workspace connected after will only have a one way incident-sync.
Also, connecting a workspace effectively disables the fusion correlation engine, but since everyone is forced over to the new experience next year this doesn’t really matter. A point here is that the fusion engine allowed you to make some configurations, while the Defender correlation engine does not.
GDAP is dead for security, unfortunately. B2B via identity governance access packages is the primary way into Defender XDR. Azure Lighthouse is still required for some access and makes it easier to deploy access management, but in the future MSP/MSSPs will struggle a lot more with access management. It breaks down to the fact that Azure Lighthouse was a static template that could be deployed by the customer to grant a set of permissions to a provider in the tenant, same ish with GDAP for roles in Entra. The process was quite easy for both parts.
For anyone who uses alert rules (on playbooks to notice when something stops working) or Azure function apps (when logic apps aren’t quite it, or can’t do assertion for federated credentials :() - you will still need Azure Lighthouse. Anyway, access packages will become the main way to access customer/child tenants for MSP/MSSPs/internal IT, while Graph API will become the main API for interacting with Sentinel/Defender. There are still some API-gaps here, but hopefully this will be covered when we reach 2026. Configuration, setup and maintenance of this type of access will require different licenses, more complexity for the customer (or just more trust in the provider to get more privileged access to deploy resources quickly) and provider.
For anyone with multiple tenants, I would recommend running an IT/out-of-band management tenant and setting up Azure Lighthouse/access packages to interact with managed tenants. Treat the accounts and the management tenant as admin accounts for all intents and purposes:
Similarly to threat modelling for our security, we can also model for what we want to detect and thus what we need in terms of log to be able to detect that. I recently read a very good article about detection surfaces:
So, to put that into words my monkey-brain understands - if we are able to threat model what use cases we need to cover, that should tell us something about the logs we need. The logs we collect is our detection surface. So when someone asks the question “can we detect this?”, the answer is one of three:
I’ve tried to summarize it in a sort-of logical diagram below.
Please also note that use cases doesn’t need to be a detection, we can also have use cases for the purpose of retaining logs in case of incident response, audits or for threat hunting.
I can only write from the context of a service provider, but design for scale. Microsoft Sentinel repositories only allows for one way sync (you need to create content as files in the repo) and it does not template/sync things from Sentinel to the repo. Can’t delete content, only supports content (resources) native to Microsoft Sentinel.
On the flip-side, running your own CI/CD from Github/Azure Devops allows you to have CI/CD push/pull pipelines, delete content and support more resource types like alert rules, function apps and keyvaults. If you’re thinking “why would I need that?” then you might not be at the scale where it’s needed but to keep it brief:
Workspace manager is… it’s alright. I wouldn’t use it, it has some of the same limitations as the repository - doesn’t cover playbooks, function apps. No delete function, can only manage content created in an upstream repository.
There’s a lot of good blog posts on technical configuration of Microsoft Sentinel, so I thought I would just list up some things to check that I’ve found over the last 5 years.
LAQueryLogs
-tableI won’t spend too much time on this, but if you are deploying content into a SIEM from templates without making adjustments, shame on you.
Two different companies, as illustrated above, might have completely different usage patterns which means a single rule might be totally silent until an attacker comes, or generate so much false positive you could go for a swim in it. I suggest detect.fyi as a good starting point for learning more about detection engineering.
This is also partly architecture, but consider what you are logging and what you are using the data for. Some verbose data might not serve a big function as primary security data, so consider using auxiliary table with summary rules to lessen the cost and maximizing value.
I’ve tried to show how this would work, but basically imagine you have a “Next-Gen Firewall” sitting around. It probably has a few different log outputs, but mainly focusing on threat detection and netflow/firewall accept/drop for this one. The first one is a good candicate for normal analytic log ingestion as the log entries would be context-based alerts and give a lot of value on their own while the volume would be quite low. The second would probably be a quite verbose log source, which is a good candidate for auxiliary logging and usage with summary rules. This allows us to take some data and push it to a custom analytic table, where we can work with it in detection queries.
On this topic - when using data collection rules, know that you can filter away parts of events in two different ways. If you are using DCR to gather information directly from servers or via a log server, you can add a filter query in the DCR configuration. An example is to filter away EventData
from windows security event logs as they contain the entire event in XML-format. You can also filter some logs that are not ingested via DCR using something called a workspace transformation DCR. Tables like AADNonInteractiveUserSignInLogs
can be filtered using this method.
Also, have a reason for doing stuff. Just enabling Microsoft Sentinel because you can, or ingesting data because it seems relevant is hardly useful. Not everything is a quick win, even if social media makes it seem that way. A lot of things will take time to do it well and get good results. People and process always win out over fancy tools, even if the world today seems like it’s going in a different direction.