Data lake is here, rejoice. It also brings up a bunch of questions, like do I still need Microsoft Sentinel? Yes. Is this just auxiliary logging done well without a lot of complications, like not being able to use the “new” Azure Monitoring Agent and instead having to lean on logstash? Sort of.
There’s been a bunch of these FAQs posted by companies, but I’m going to give some of the questions I had regarding this a whirl and add some new ones as well.
July 22nd Microsoft announced their Security Data Lake.
It’s a big deal!
Why? Well, to keep it brief:
All in all, it makes everything a bit cheaper and easier with a lot less management overhead (in theory).
EDIT: As someone on LinkedIn kindly pointed out, KQL jobs currently only runs daily at the fastest, while summary rules run every 20 minutes. Thus, their use cases are different as of now:
Looking into my 🔮 I have some predictions though! I suspect that summary rules and KQL jobs will be the same capability with the ability to run more frequently, but maybe with a cost attached to it based on the amount of data queried (like it currently is with KQL jobs) and/or frequency maybe? Speculation on my side - but it makes sense that components from Sentinel/Defender with similar capabilities will merge into one to avoid confusion in the future.
In essence it’s a data lake with the security-label slapped on it.
As I understand (I’m often wrong) the idea here is that the data is stored in a single open format in what I will refer to as a “cheap tier” - the data lake storage tier.
Combined with deduplication of data ensuring that you only store single copy of any data this in turn makes for cost-effective storage - which, again, makes a lot of sense.
Now, for our security purposes we need to be able to effectively query the data. The data lake has full KQL support via KQL interactive queries (costs money to run vs normal KQL queries which are free) and you can bring your own analytic engine, use notebooks etc. Not sure how this works yet if you ingest unstructured data that’s not a native connector (maybe this is not supported).
Add the ability to promote data on demand to higher tiers (like summary rules) using KQL jobs and that’s the basis of what it does.
That’s a long question, how did you come up with that?
Short answer is that it’s ease of use. The data lake will handle a lot of the complexity for you at a reasonable price, also making sure everything lives in the “security-portal” context so it’s easy to use for the SecOps team.
The long answer - in essence this is a combined monetary answer to cheaper storage for data that we don’t want in the analytic tier for multiple reasons , while also serving as a financially viable option to storage accounts. The difference between Sentinel Data Lake and using auxiliary logs and azure data explorer/storage account is ease of use (and the fact that the data lake is a purpose built cloud-native security data lake).
Maintaining a complex log-ingestion infrastructure often means relying on third party options such as Cribl or Tenzir to filter logs and push them to their correct destination. Working with logs in any capacity from ADX is fine, but management overhead is a big cost/investment. Working with logs from storage accounts can be trying at times if you have low patience. They’re there, but it takes a while to work with them. All in all, it works, but it requires you to do a lot of configuration, maintenance and development.
This is also an answer from Microsoft to all of the third party solutions that pop up in that gap beneath Microsoft Sentinel, that some filled with Azure Data Explorer and Storage Accounts.
Let’s map this up in a table.
Tier | Examples of use | Ingestion cost | Retention included in base cost | Other costs |
---|---|---|---|---|
Analytic | EDR Antivirus Authentication logs Threat Intelligence |
Yes | 90 days (can be extended for extra cost) | Long-term retention cost |
Data Lake | Netflow TLS/SSL certificates Firewall Proxy |
Yes | 30 days (can be extended for extra cost) | Federation cost Compute cost Long-term retention cost |
So far this article is what we have to go on together with the updated Sentinel billing learn-page.
In summary, we pay for ingested GB as usual and storage outside of the included retention per GB as usual. In addition, we pay for queried GB similar to basic/auxiliary logs. The new thing is that we pay for compute costs when running advanced data insights (this payment for running Jupyter notebook sessions/jobs)
The Microsoft Sentinel pricing calculator isn’t updated, so the prices are as follows (in $USD based on East US pricing):
Capability | Measured in | Price per measurement |
---|---|---|
Data lake ingestion | Data processed (per GB) | $0.05 |
Data processing | Data Processed (per GB) | $0.10 |
Data lake storage | Data Stored (GB/Month) | $0.026 |
Data lake query | Data Analyzed (per GB) | $0.005 |
Advanced data insights | Compute Hour | $0.15 |
Not directly. SDL introduces two “new” concepts, KQL interactive queries (which are just KQL queries running against data in the lake at the aformentioned cost) and KQL jobs.
KQL jobs essentially allows you to run KQL queries as scheduled tasks and promote the data from the results to the analytic tier.
Sounds similar to something? Yes, it’s an easier way of doing the aux/basic promotion to analytic tier via summary rules!
Aside from added cost, there should be no major changes.
Short answer is through the same way you were getting data into Microsoft Sentinel. All data is mirrored, or you can choose to only ingest into data lake tier for all except some tables.
Long answer - One of the boons of the data lake is that it mirrors your existing workspace(s) for free, and it allows you to use the existing 350+ Microsoft Sentinel data connectors to ingest data into it.
The UI experience is to simply go to the new “Table Management” experience in the Defender portal and simply choose the destination of a table to be either analytic or data lake tier.
Once SDL is enabled, auxiliary log tables are no longer visible in Microsoft Defender’s Advanced hunting or in the Microsoft Sentinel Azure portal. The auxiliary table data is available in the data lake and can be queried using KQL queries or Jupyter notebooks. Not all tables can be switched over to the Data Lake Tier - notable mentions are the Defender XDR tables and some Microsoft Sentinel solution tables. As you can see from the image above, by default all data sent to the analytic tier is mirrored to the data lake at no extra cost. The retention is the same as the analytic table it’s mirroring.
If you switch from analytic + mirroring (default) to only ingesting into the data lake tier any new data will stop coming to the analytic tier table
This diagram shows the retention components of the analytics, data lake, and XDR default tiers, and which table types apply to each tier:
If you want to further compare analytic tier vs data lake tier, check out this table as it does a good job of explaining further.
You should switch over once SDL becomes GA. Right now it’s public preview.
Two reasons - one, there is some region limitations for the SDL while it’s in preview. If your tenant is in Norway, for example, it defaults to store data in Norway East datacenters and is thus not eligible.
Second reason is some capacity issues in certain regions. These are being fixed as of writing this, you should be able to enroll in all supported regions come August 2025.
You can bet it probably does. Having a large dataset of data to reason over might finally make me believe the AI hype. We’ll see, jury’s out.
Yes, they work. You can run them against the data lake at the price mentioned above. You run them straight from VSCode using the Microsoft Sentinel extension.
You will need a role that allows for reading tables, writing to tables and managing jobs - either Security Operator, Security Administrator or Global Administrator.
You also need the managed identity of the data lake to have the Log Analytics Contributor role on the log analytics workspace(s) connected to the lake.
I usually never advocate for using ADX unless you use something like ADXFlowmater or build IAC integrations that allow you to keep it up to date with table schemas and logs without too much manual intervention.
Now SDL will be the easier option. It’s hard to say without comparing the two what the pricing will end up favoring, but I’m guessing if you account for time saved you will want to go for SDL.
If you’re eligible, go to https://security.microsoft.com/sentinelsettings/data-lake and follow the wizard.
No, it works for all of them - if they’re all in the same region.
Some caveats here:
More data is always better - but at some point more data becomes expensive. Handling this more data in cheaper storage usually gave you three options:
SDL takes the idea of auxiliary logs, summary rules, azure data explorer/storage accounts for warm/cold data and puts it into a cloud-native data lake at reasonable cost and work.
This allows you to retain more data for threat hunting, and store data longer for any compliance and incident response purposes you might have. The SDL is built for the usage with external analytic engines and jupyter notebooks to add more advanced analytics options and is also built with AI usage in mind, particularly agents reasoning over the data within certain guardrails.
Yes? Maybe? It’s correct according to the sources I could find as of writing this, 29th of July 2025. It’s still a preview feature (even if it’s public preview), so some of it is still subject to change!