Boys and girls, the plan is simple. I’m a self-appointed AI skeptic. Will I use AI and agents and MCP-servers? Sure, when it works properly. The only problem I have is that it’s beginning to look a lot like it works pretty well, so that’s what I want to test.
For my presentation at the Nordic Infrastructure Conference and the following Microsoft Security Summit I did some light testing of the MCP capabilitis in VSCode and that’s what I’m here to talk about.
This is not the post for that. I will give you some resources, so either check out my blog called Practical Detection Engineering, or head over to some of these other amazing resources:
In short, detection engineering is a discipline that’s all about making sure our worries about what might happen to us are represented in our detection coverage. It’s a field that covers a lot of different skillsets depending on region, company size and your companys needs.
Detection engineering is mostly practiced as a “detection creator” role, but it also plays a helping hand in determining detection gaps, missing data (blindspots where we are lacking both insight and detections), formulating queries for threat hunting, incident response and a lot more.
Starting at the beginning, there is one thing we need to tackle - namely MCP or Model Context Protocol. This was recently made open-source by creators Anthropic and is the de-facto standard for connecting AI applications to external systems. An apt example of this is that it allows a chat AI interface, or your agent in your IDE (like VSCode) to interact via the MCP with other systems such as directly querying your notion docs or updating your calendar.

An apt security example is the Microsoft Sentinel MCP server which allows us to do query data in the Microsoft Sentinel Data Lake and the Microsoft Sentinel Graph. The current tool collection is as follows (but more is likely on the way):
You can also create your own custom tooling from KQL queries. There are also some features in preview here, namely MCP tools for the Microsoft Sentinel Graph.
If you want to learn more about MCP, head over to the MCP for beginners repository.
Recently made GA as of December 1st, Sentinel Graph is described as:
“a unified graph analytics capability within Microsoft Sentinel that empowers security teams to model, analyze, and visualize complex relationships across their digital estate. Unlike traditional tabular approaches, Sentinel graph enables defenders and AI agents to reason over interconnected assets, identities, activities, and threat intelligence—unlocking deeper insights and accelerating response to evolving cyber threats.”

The idea then is to allow for defenders to not only rely on the graph for investigation and threat hunting, but also proactive security work. Exposing this to an agent via MCP tooling allows us to let agent reason over the graph alongside our security data in Defender/Sentinel.
The relevancy of the graph for detection engineering depends on how far you want to stretch the use case, but threat hunting and proactive security work it will most likely be a huge boon. The MCP tooling that comes with the preview is currently the two following categories:
🛠️ Custom graph
“Author notebooks to model, build, visualize, traverse, and run advanced graph analyses like Chokepoint/Centrality, Blast Radius/Reachability, Prioritized Path/Ranked, and K-hop. It’s a transformative leap in graph analytics, fundamentally changing how security teams understand and mitigate organizational risk by connecting the dots in their data.”
🤖 Sentinel graph MCP tools
“Use purpose-built Sentinel graph MCP tools (Blast Radius, Path Discovery, and Exposure Perimeter) to build AI agents for getting insights from the graph in natural language.”
Quoted directly from the preview sign-up.
According to Microsoft Ignite News
“… beyond Security Copilot and VSCode Github Copilot, Sentinel MCP server is now natively integrated with Copilot Studio and Microsoft Foundry agent-building experiences.”
Now, I’m quite lazy, so my implementation will use VSCode Github Copilot since that allows me to do everything in a repository. I’m not going to automate anything, but using some of the other experiences would probably be better for making this possible.
Using VSCode is quite easy in terms of the MCP configuration. Under the extensions tab we now have MCP Servers as it’s own category. Here we can install the Microsoft Learn MCP which is always useful, along with the official Github MCP. I’m mostly interested in the following two capabilities of the latter:
However nice the agent + MCP combo might be, I still want to have it create a PR when it makes changes for me.
Then we have to setup the two Microsoft Sentinel MCP servers, namely data exploration and triage. You also have the security copilot agent creation but I’m not going to be using that one for now.
Microsoft has a really simple guide for setting up the MCP servers:


https://sentinel.microsoft.com/mcp/triagehttps://sentinel.microsoft.com/mcp/data-exploration

And that should be it, under MCP servers you should see something like this:

And that’s it for the MCP setup so far.
When working in VSCode, one of the things we’ll need to do is creating custom agents in our repository. Github outlines the process here, but the gist of it (pun intended) is to create agents.md files in our .github/agents folders. Now agents.md is also an open standard, sort of like readme-files for agents.
We can see some examples over at the agents.md page, but generally it’s markdown file that allows us to provide simple instructions to our agents.
These custom agents will then be selectable in the Github Copilot Chat:

I’ll explain more about the agents I’ve created later, it’s not thaaat important.
We can also provide general instructions for Github copilot in our repository by creating a copilot-instructions.md file over at .github/copilot-instructions.md.
This follows a similar logic as the agents.md files, it’s a set of instructions. Some examples can be found over at github docs, or you can have Copilot generate their own instructions based on prompting.
As you could see from my previous example I had four custom agents created. I don’t think that’s the correct way to split it up. Creating multiple agents means I have to switch context to make sure we are using the correct agent for the right tasks, but we could probably also just create one agent called detection-helper.md instead, or we could use copilot-instructions.md to help us.
My general idea was to break detection engineering into four parts:
My initial idea was to split these up into the different custom agents. The only issue I have when running this in VSCode is that I don’t have any way (that I currently know of) to easily make the different agents interface against eachover and handover.
It would look something like this from the user query side:
flowchart LR
subgraph ".github/agents"
A[detection-creator]
B[detection-validator]
C[detection-tester]
D[detection-tuner]
end
subgraph "VSCode Copilot Chat"
U[User]
end
U --> |"1. Create detection"| A
U --> |"2. Validate query"| B
U --> |"3. Test query"| C
U --> |"4. After running for a while, tune query"| D
A --- MCPLEARN[MCP Microsoft Learn]
B --- MCPLEARN[MCP Microsoft Learn]
C --- MCPSENTINEL[MCP Data Exploration]
D --- MCPSENTINEL[MCP Data Exploration]
It works fine, I’ve tested it, but it requires switching context manually or providing the instructions to copilot so it knows which specific agent to run, which isn’t always as straightforward as you might want to believe.
So what can we do instead? Well, we can simply use the copilot-instructions.md file and merge all of the flows into one instruction set.
flowchart LR
subgraph ".github/"
A[copilot-instructions.md]
end
subgraph "VSCode Copilot Chat"
U[User]
end
U --> |"1. Create detection"| A
U --> |"2. Validate query"| A
U --> |"3. Test query"| A
U --> |"4. After running for a while, tune query"| A
Both work, but the latter is the simplest way of doing this.
Let’s head back over to our original idea and see how we would instruct it to call MCP servers.
flowchart LR
subgraph ".github/agents"
A[detection-creator]
B[detection-validator]
C[detection-tester]
D[detection-tuner]
end
A --- MCPLEARN[MCP Microsoft Learn]
B --- MCPLEARN[MCP Microsoft Learn]
C --- MCPSENTINEL[MCP Data Exploration]
D --- MCPSENTINEL[MCP Data Exploration]
Similarly, when we describe the actual flow of a single detections lifecycle it would look something like this:
flowchart TD
U[User]
subgraph ".github/agents"
A[detection-creator]
B[detection-validator]
C[detection-tester]
D[detection-tuner]
end
1[Create detection based on template] --> A
A -.-> |Call| MCPLEARN[MCP Microsoft Learn]
subgraph "Repository"
A --> |Creates detection as file in repo and creates| DF["Detection.yaml"]
PR[Pull Request]
U -.-> |Approve| PR
PR --> |Merged into main| DF
DF --> |When approved| P[Pipeline]
end
B --> |Validate| DF["Detection.yaml"]
B -.-> |Call| MCPLEARN[MCP Microsoft Learn]
C --> |Test| DF["Detection.yaml"]
C -.-> MCPSENTINEL[MCP Data Exploration]
C --> |Create| PR
U --> |VSCode Copilot Chat| D
D -.-> |Tune| MCPSENTINEL[MCP Data Exploration]
P --> S[Microsoft Sentinel]
It’s a bit messy, but it follows the same idea we had originally - we can provide a template that Copilot will help us change based on our input, it will validate it against best practices and then test it against live historical data in the data lake via MCP.
Some notes here:
SecurityAlerts, SecurityIncident) tables from Sentinel and other tools are being sent to the data lake. This is because the exploration tool requires data lake to be enabled.As a quickstart, here’s an example file that created. It’s basically a condensed version of the instructions I created for the four other agents.
copilot-instructions-example.md
I would strongly recommend that you create your own instruction files or custom agents using your own prompts, or writing them out yourself. Detection engineering flows and lifecycles vary greatly based on team size, so you might only need help with proactive tuning for instance.
So after setting up the MCP-servers (after enabling Sentinel Data Lake) and creating the instructions file/custom agents I also did the following:
DeviceProcessEvents to be forwarded to the data lakeHere’s what Copilot came up with:
// Detection: Suspicious PowerShell EncodedCommand (Base64)
// Description: Detects PowerShell/Pwsh executions using Base64-encoded commands via -Enc/-EncodedCommand
// MITRE ATT&CK: T1059.001 (PowerShell), T1027 (Obfuscated/Compressed Files and Information)
// Data Sources: DeviceProcessEvents (MDE), SecurityEvent (4688), WindowsEvent (4103/4104)
// Tuning: Consider excluding known admin scripts or packaging tools; require DecoderHit for higher confidence
let EncTokens = dynamic(["-enc","-encodedcommand","/enc","/encodedcommand"]);
let B64Re = @"(?i)-e(?:ncodedcommand|n)?\s+([A-Za-z0-9+/]{40,}={0,2})";
let decode = (s:string) { replace(@'\u0000','', base64_decode_tostring(s)) };
union isfuzzy=true
(
DeviceProcessEvents
| where Timestamp > ago(2d)
| where tolower(FileName) in ("powershell.exe","pwsh.exe","powershell_ise.exe")
| where ProcessCommandLine has_any (EncTokens) or ProcessCommandLine matches regex B64Re
| extend EncodedArg = extract(B64Re, 1, ProcessCommandLine)
| extend Decoded = iff(isnotempty(EncodedArg), decode(EncodedArg), "")
| extend DecoderHit = iff(Decoded has_any ("iex","invoke-expression","downloadstring","frombase64string","new-object net.webclient","invoke-webrequest","iwr","-nop","-w hidden","-noni"), true, false)
| project TimeGenerated=Timestamp, HostName=DeviceName, Account=InitiatingProcessAccountName, CommandLine=ProcessCommandLine, Parent=InitiatingProcessParentFileName, FileName
),
(
SecurityEvent
| where TimeGenerated > ago(2d) and EventID == 4688
| where NewProcessName has "powershell" or CommandLine has "powershell"
| where CommandLine has_any (EncTokens) or CommandLine matches regex B64Re
| extend EncodedArg = extract(B64Re, 1, CommandLine)
| extend Decoded = iff(isnotempty(EncodedArg), decode(EncodedArg), "")
| extend DecoderHit = iff(Decoded has_any ("iex","invoke-expression","downloadstring","frombase64string","new-object net.webclient","invoke-webrequest","iwr","-nop","-w hidden","-noni"), true, false)
| project TimeGenerated, Computer, Account, CommandLine, EncodedArg, Decoded, DecoderHit
),
(
WindowsEvent
| where TimeGenerated > ago(2d) and EventID in (4104, 4103, 4688)
| extend CommandLine = tostring(EventData.CommandLine)
| where tostring(Channel) has "PowerShell" or tostring(Provider) has "PowerShell" or tostring(RenderedDescription) has "PowerShell"
| where isnotempty(CommandLine) and (CommandLine has_any (EncTokens) or CommandLine matches regex B64Re)
| extend EncodedArg = extract(B64Re, 1, CommandLine)
| extend Decoded = iff(isnotempty(EncodedArg), decode(EncodedArg), "")
| extend DecoderHit = iff(Decoded has_any ("iex","invoke-expression","downloadstring","frombase64string","new-object net.webclient","invoke-webrequest","iwr","-nop","-w hidden","-noni"), true, false)
| project TimeGenerated, Computer, Account=tostring(UserName), CommandLine, EncodedArg, Decoded, DecoderHit
)
| extend SuspicionScore = 1
+ iif(strlen(EncodedArg) >= 60, 1, 0)
+ iif(DecoderHit, 1, 0)
| where SuspicionScore >= 2
| order by TimeGenerated desc
Alright, now to generate some test data!
I ran basically just ran a bunch of base64 encoded powershell on my test machine. Nothing special. Then I got a bit eager (data had not yet replicated to the data lake) and started trying to test it:

The first query was generating a lot of noise, so I asked it to perform some tuning based on the data that I had generated. Now, keep in mind, at this point I did not have the SecurityIncident table forwarded to the data lake, so I had to provide the true/false/benign positive verdicts via the chat. Anyway, here’s what it came up with (I basically pasted the incident ID, the verdicts and asked it to tune):

And that’s it. If you want to look at what this look like in action, here’s a short gif that shows the agent creating a plan to test detection and querying the MCP server briefly:

The agent will perform testing according to our specification and possibly alter the detection a bit before deploying.
The new version of the detection now looked like this:
// Detection: Suspicious PowerShell EncodedCommand (Base64) - TUNED
// Description: Detects PowerShell/Pwsh executions using Base64-encoded commands via -Enc/-EncodedCommand
// MITRE ATT&CK: T1059.001 (PowerShell), T1027 (Obfuscated/Compressed Files and Information)
// Data Sources: DeviceProcessEvents (MDE), SecurityEvent (4688), WindowsEvent (4103/4104)
// Tuning Version: v1.0 - 2025-11-13
// Changes: Exclude benign reconnaissance, increase DecoderHit weight, raise threshold
// Expected FP Reduction: 75% -> 10-20%
let EncTokens = dynamic(["-enc","-encodedcommand","/enc","/encodedcommand"]);
let B64Re = @"(?i)(?:-enc(?:odedcommand)?|-e)\s+([A-Za-z0-9+/=]{20,})";
let decode = (s:string) { replace(@'\u0000','', base64_decode_tostring(s)) };
// Benign reconnaissance commands to exclude (unless paired with execution indicators)
let BenignCommands = dynamic(["systeminfo","wmic os","wmic process get","wmic computersystem","wmic qfe","get-computerinfo","get-wmiobject win32_operatingsystem","get-childitem env:"]);
union isfuzzy=true
(
DeviceProcessEvents
| where Timestamp > ago(2d)
| where tolower(FileName) in ("powershell.exe","pwsh.exe","powershell_ise.exe")
| where ProcessCommandLine has_any (EncTokens) or ProcessCommandLine matches regex B64Re
| extend EncodedArg = extract(B64Re, 1, ProcessCommandLine)
| extend Decoded = iff(isnotempty(EncodedArg), decode(EncodedArg), "")
| extend DecodedLower = tolower(Decoded) // Case-insensitive comparison
| extend DecoderHit = iff(DecodedLower has_any ("iex","invoke-expression","downloadstring","frombase64string","new-object net.webclient","invoke-webrequest","iwr","bitstransfer","start-bitstransfer"), true, false)
// Exclude benign reconnaissance unless paired with execution indicators
| extend BenignHit = iff(DecodedLower has_any (BenignCommands) and not(DecoderHit), true, false)
| where not(BenignHit)
| project TimeGenerated=Timestamp, HostName=DeviceName, Account=InitiatingProcessAccountName, CommandLine=ProcessCommandLine, Parent=InitiatingProcessParentFileName, FileName, EncodedArg, Decoded, DecoderHit
),
(
SecurityEvent
| where TimeGenerated > ago(2d) and EventID == 4688
| where NewProcessName has "powershell" or CommandLine has "powershell"
| where CommandLine has_any (EncTokens) or CommandLine matches regex B64Re
| extend EncodedArg = extract(B64Re, 1, CommandLine)
| extend Decoded = iff(isnotempty(EncodedArg), decode(EncodedArg), "")
| extend DecodedLower = tolower(Decoded)
| extend DecoderHit = iff(DecodedLower has_any ("iex","invoke-expression","downloadstring","frombase64string","new-object net.webclient","invoke-webrequest","iwr","bitstransfer","start-bitstransfer"), true, false)
| extend BenignHit = iff(DecodedLower has_any (BenignCommands) and not(DecoderHit), true, false)
| where not(BenignHit)
| project TimeGenerated, Computer, Account, CommandLine, EncodedArg, Decoded, DecoderHit
),
(
WindowsEvent
| where TimeGenerated > ago(2d) and EventID in (4104, 4103, 4688)
| extend CommandLine = tostring(EventData.CommandLine)
| where tostring(Channel) has "PowerShell" or tostring(Provider) has "PowerShell" or tostring(RenderedDescription) has "PowerShell"
| where isnotempty(CommandLine) and (CommandLine has_any (EncTokens) or CommandLine matches regex B64Re)
| extend EncodedArg = extract(B64Re, 1, CommandLine)
| extend Decoded = iff(isnotempty(EncodedArg), decode(EncodedArg), "")
| extend DecodedLower = tolower(Decoded)
| extend DecoderHit = iff(DecodedLower has_any ("iex","invoke-expression","downloadstring","frombase64string","new-object net.webclient","invoke-webrequest","iwr","bitstransfer","start-bitstransfer"), true, false)
| extend BenignHit = iff(DecodedLower has_any (BenignCommands) and not(DecoderHit), true, false)
| where not(BenignHit)
| project TimeGenerated, Computer, Account=tostring(UserName), CommandLine, EncodedArg, Decoded, DecoderHit
)
| extend SuspicionScore = 1
+ iif(strlen(EncodedArg) >= 100, 1, 0) // Increased threshold from 60 to 100
+ iif(DecoderHit, 2, 0) // Increased weight from 1 to 2
+ iif(tolower(Decoded) has_any ("bypass","hidden","noprofile","sta","-nop","-w hidden"), 1, 0) // Additional evasion indicators
| where SuspicionScore >= 3 // Increased from 2 to 3
| order by TimeGenerated desc
To summarize, it made some simple changes to the query by adding some benign commands:
// Benign reconnaissance commands to exclude (unless paired with execution indicators)
let BenignCommands = dynamic(["systeminfo","wmic os","wmic process get","wmic computersystem","wmic qfe","get-computerinfo","get-wmiobject win32_operatingsystem","get-childitem env:"]);
Not sure I agree, but I’m not going to get into semantics (pun intended again) here. LOLBAS and LOLBIN detection is notoriously difficult to get right by the nature of what it is. It also added some changes to the suspicion score logic, but don’t take it from me.
Part of the instruction set I created also includes creating reports in pull requests that make it easier to review changes. Here’s part of the test-report from the above change:
Detection Name: Suspicious PowerShell EncodedCommand (Base64)
Test Date: 2025-11-20
Workspace: infex-law (xxx)
Detection Version: v1.0 - TUNED (2025-11-13)
Test Duration: 7 days (2025-11-13 to 2025-11-20)
✅ DETECTION OPERATIONAL
The suspicious_powershell_base64 detection is functioning correctly and generating high-confidence alerts. The tuned version (v1.0) successfully identified 5 suspicious PowerShell executions with Base64-encoded commands, all scoring at the threshold (SuspicionScore = 3) over a 7-day period.
iex + invoke-webrequest + evasion flags)The full report is a bit longer, but this gives us an idea of what changes were made and why!
I think there’s a lot of potential once we can let agents resonate over our data. I still firmly believe that we need to be very specific in creating custom agents with very limited scope of work and strict instructions.
An example I can think of is smaller companies that don’t have the capacity to have a full set of SOC roles inhouse. Being able to effectively outsource parts of the work that you would normally never do, such as:
Obviously we are still at a point where people are skeptical about letting agents do stuff like triage, which makes a bit of sense. Security people like to control input and what happens in our systems, which is why I usually advocate for allowing agents to submit pull requests that we can approve.
That being said, could there be potential for an 24/7 agent in the future? Sure! How? Well, it just needs to work. I think it would be an interesting experiment to try having an agent running as a responder in a sort of “what-if” mode over the course of a month and see how well it performs versus humans and/or traditional SOAR. I think doing it this way and building trust in the agents capabilities might be a good idea. You’ll need to make sure you are giving it proper instructions and data to make good decisions in your context, but you’ll get live data of performance without affecting production.
Could it be an alternative to 24/7 at some point? Honestly, I don’t know. Probably not in it’s current state. If it actually works it might be a cost-effective alternative!

Maybe I’m getting a bit ahead of myself.
Anyway, to summarize:
Signing off for this year,
-T