How to not mess up your Microsoft Sentinel deployment

Looking beyond just the technical details

I recently did a presentation with the same title as this post and figured it would be a good idea to also type out some of the points made there, as I think it’s valuable. I’ve split it up into three sections (don’t worry, it won’t be three parts) that I have categorized the different types of mistakes in.

Broadly speaking, I’ve grouped the presentation into:

  1. Decision-making - why did we choose to go for a SIEM, which SIEM works for us
  2. Architecture - design for the future, not for yesterday
  3. Technical - everything has probably been said before, but we’ll touch on it briefly

Now, my post is on Microsoft Sentinel specifically, but a lot of the generic advice here will transfer over to any deployment of security tools.

1. Decision-making

This is the part where we talk about why - why should you get a SIEM? Does it makes sense for you and your organization? I think a lot of the inherent mistakes we make come from not understanding how much work goes into a SIEM. It’s not just the tool itself, but it’s everything around it:

  • SOAR
  • Detection engineering
  • Cyber Threat Intelligence
  • Threat hunting
  • Analysis and incident response
  • And more …

SIEM/Secure operations iceberg.

Talking to non-security people, some have mentioned that Sentinel is “AI-powered” and “does a lot of things for you”. Weeeeeeeeeeeell, about that. It’s a step up in user-friendliness and usability from the likes of Splunk, for sure, but it doesn’t give you too much for free. Anyone who sells you on their giant library of pre-built automation have never ever deployed a SOAR-playbook in their life, trust me.

The original sin - the first decision

All roads lead to roam and all mistakes lead back to deciding to invest in a SIEM. That might be a tad dramatic, but it’s true. Consider factors such as the ability to effectively use a SIEM (which is something akin to IaaS in terms of management) vs an XDR which is mostly SaaS. For most companies of small and medium size, the choice between either or always favors XDR.

I like to use this image to illustrate what I mean - while Defender XDR falls into partially identify, mostly protect + detect and some respond, Microsoft Sentinel falls firmly into detect + respond. Sentinel gives you more options to ingest data to do more detection, but with the move from Ibiza UX (Azure Portal) to the Defender portal this becomes and integrated part of Defender. It means you can onboard Microsoft Sentinel as a log ingestion engine (as Defender’s correlation engine will be the working once you have connected a workspace) and SOAR-platform, while using Defender XDR as your primary tool.

All in all, start with EDR/XDR and expand with a SIEM.

Second stage - ignore social media

Small rant here - there’s a lot of content telling you that Microsoft Sentinel (or other tools) are awesome, powered by AI and will basically just make you secure. Not true. It’s important to know that social media caters to quick wins - a generic good idea might not be good for you, consider your own context.

Third stage ignition - threat modelling

Before making a decision, consider doing some threat modelling! I have written about getting into this before here, but a quick primer goes something like this:

  1. Define objective - what are we trying to accomplish? In this case, maybe securing our organisation?
  2. Identify assets - what are we trying to protect? Our domain controllers, our privileged accounts, our business critical servers
  3. Find threats - what is the most likely attack vector? What is the most devastating attack vector? Someone gaining administrator access to AD, someone breaching an admin account
  4. Mitigate - what is our best option for securing these assets against these threats? MFA, Privileged access workstations, network segmentation, EDR, Defender for Identity
  5. Document, review, repeat - write it down, see that it makes sense and go back and do this all over again.

A good example scenario can be “we want to detect people logging in from outside of Norway, that should be flagged”! Well, why don’t we just use conditional access to block access from outside Norway, then create a system for exemptions for travel (maybe access package self service with manager approval) and instead monitor for more specific use cases? Or we could just write detections for this, even if they would probably be quite noisy and have little actionable output.

2. Architecture

So we’ve decided to implement a SIEM (if not this blog would end right here). It’s hard to address specifically for Microsoft Sentinel because some companies have large, world-wide operations in different geo’s, while some are located in a single region. So, this is going to be from the context of a single region. Usually there are two modes for deployment Microsoft Sentinel:

  1. Shared log analytics workspace - shared between OPS and SecOps
  2. Dedicated security LAW - security only

Recently, option 2 was discussed for the new Azure landing zones and seems to be what we are heading towards. It makes access control a lot easier, generally reduces cost (paying double for any K8s logs costs the same as a small country) and with the small drawback of having to dual-home some logs. Also important to note that with the introduction of the unified platform/experience, Microsoft Sentinel will primarily live in the Defender XDR UI from summer 2026.

This means we have to consider the fact that our Defender XDR correlation engine will only work for the first workspace we connect - any workspace connected after will only have a one way incident-sync.

“A primary workspace’s alerts are correlated with Microsoft Defender XDR data. So, incidents include alerts from Microsoft Sentinel’s primary workspace and Defender XDR in a unified queue.”

Also, connecting a workspace effectively disables the fusion correlation engine, but since everyone is forced over to the new experience next year this doesn’t really matter. A point here is that the fusion engine allowed you to make some configurations, while the Defender correlation engine does not.


For MSP/MSSPs or large companies with multiple tenants

GDAP is dead for security, unfortunately. B2B via identity governance access packages is the primary way into Defender XDR. Azure Lighthouse is still required for some access and makes it easier to deploy access management, but in the future MSP/MSSPs will struggle a lot more with access management. It breaks down to the fact that Azure Lighthouse was a static template that could be deployed by the customer to grant a set of permissions to a provider in the tenant, same ish with GDAP for roles in Entra. The process was quite easy for both parts.

For anyone who uses alert rules (on playbooks to notice when something stops working) or Azure function apps (when logic apps aren’t quite it, or can’t do assertion for federated credentials :() - you will still need Azure Lighthouse. Anyway, access packages will become the main way to access customer/child tenants for MSP/MSSPs/internal IT, while Graph API will become the main API for interacting with Sentinel/Defender. There are still some API-gaps here, but hopefully this will be covered when we reach 2026. Configuration, setup and maintenance of this type of access will require different licenses, more complexity for the customer (or just more trust in the provider to get more privileged access to deploy resources quickly) and provider.

For anyone with multiple tenants, I would recommend running an IT/out-of-band management tenant and setting up Azure Lighthouse/access packages to interact with managed tenants. Treat the accounts and the management tenant as admin accounts for all intents and purposes:


Logging decisions

Similarly to threat modelling for our security, we can also model for what we want to detect and thus what we need in terms of log to be able to detect that. I recently read a very good article about detection surfaces:

“Detection surface as a concept is fundamentally intertwined with the attack surface of an organisation. These terms are linked in the sense that they both refer to the possibility that an attack could occur, or the possibility for an organisation to detect something has happened, not that it has or will.”

So, to put that into words my monkey-brain understands - if we are able to threat model what use cases we need to cover, that should tell us something about the logs we need. The logs we collect is our detection surface. So when someone asks the question “can we detect this?”, the answer is one of three:

  1. “Yes, we have detection for this”
  2. “Yes, we can - we need to write detection, but we have the data”
  3. “No, we don’t have the data”

I’ve tried to summarize it in a sort-of logical diagram below.

Please also note that use cases doesn’t need to be a detection, we can also have use cases for the purpose of retaining logs in case of incident response, audits or for threat hunting.

Content-management decisions

I can only write from the context of a service provider, but design for scale. Microsoft Sentinel repositories only allows for one way sync (you need to create content as files in the repo) and it does not template/sync things from Sentinel to the repo. Can’t delete content, only supports content (resources) native to Microsoft Sentinel.

On the flip-side, running your own CI/CD from Github/Azure Devops allows you to have CI/CD push/pull pipelines, delete content and support more resource types like alert rules, function apps and keyvaults. If you’re thinking “why would I need that?” then you might not be at the scale where it’s needed but to keep it brief:

  1. Alert rules can alert you when an Azure resource, like a logic app, has a critical failure
  2. Function apps can handle a lot of things (federated credential assertions, log forwarding at higher volume, is required for a bunch of data connectors)
  3. Key vaults are useful for secret-exchanges (I don’t want to, but managed identities at scale access management sucks really bad) and storing sensitive information

Workspace manager is… it’s alright. I wouldn’t use it, it has some of the same limitations as the repository - doesn’t cover playbooks, function apps. No delete function, can only manage content created in an upstream repository.

3. Configuration

There’s a lot of good blog posts on technical configuration of Microsoft Sentinel, so I thought I would just list up some things to check that I’ve found over the last 5 years.

Make sure to audit your configuration sometimes

  1. Check your retention - make sure it’s what you expect it to be
  2. It’s possible to set a data ingestion cap on the underlying log analytics workspace, make sure this is not set
  3. Check that UEBA and anomalies are configured correctly
  4. If a log source stops sending data, will you catch it?
  5. If a SOAR playbook is disabled or fails, will you catch it?
  6. Enable health logging
  7. Enable the diagnostic audit-logging to get the LAQueryLogs-table

Detection engineering

I won’t spend too much time on this, but if you are deploying content into a SIEM from templates without making adjustments, shame on you.

Two different companies, as illustrated above, might have completely different usage patterns which means a single rule might be totally silent until an attacker comes, or generate so much false positive you could go for a swim in it. I suggest detect.fyi as a good starting point for learning more about detection engineering.

Filtering logs and log tiers

This is also partly architecture, but consider what you are logging and what you are using the data for. Some verbose data might not serve a big function as primary security data, so consider using auxiliary table with summary rules to lessen the cost and maximizing value.

I’ve tried to show how this would work, but basically imagine you have a “Next-Gen Firewall” sitting around. It probably has a few different log outputs, but mainly focusing on threat detection and netflow/firewall accept/drop for this one. The first one is a good candicate for normal analytic log ingestion as the log entries would be context-based alerts and give a lot of value on their own while the volume would be quite low. The second would probably be a quite verbose log source, which is a good candidate for auxiliary logging and usage with summary rules. This allows us to take some data and push it to a custom analytic table, where we can work with it in detection queries.

On this topic - when using data collection rules, know that you can filter away parts of events in two different ways. If you are using DCR to gather information directly from servers or via a log server, you can add a filter query in the DCR configuration. An example is to filter away EventData from windows security event logs as they contain the entire event in XML-format. You can also filter some logs that are not ingested via DCR using something called a workspace transformation DCR. Tables like AADNonInteractiveUserSignInLogs can be filtered using this method.

Summary (rules)

  1. Threat modelling is important
  2. Prioritize having enough people to be able to accomplish goals - fancy tools and AI with no-one to accept the input (and complete the feedback loop) is just wasted money
  3. Detection should be based in your threat modelling and threat landscape
  4. Data ingestion should be connected to your use cases (and this your threat modelling)
  5. Repeat often (4. and 5.)

Also, have a reason for doing stuff. Just enabling Microsoft Sentinel because you can, or ingesting data because it seems relevant is hardly useful. Not everything is a quick win, even if social media makes it seem that way. A lot of things will take time to do it well and get good results. People and process always win out over fancy tools, even if the world today seems like it’s going in a different direction.

Tags: Microsoft Sentinel, Defender XDR, Graph API, Azure Lighthouse, Custom Detection Rules
Share: Twitter LinkedIn