The idea that ingesting all the data and enabling all the use cases is the best approach to security monitoring is something we have explored time and time again here on this blog. So I thought, instead of just ranting about it, let’s show you how I would go about doing it. For this purpose I have create a mock company, which I will present, and a mock infrastructure. I will then go through the process of determining what data sources to use for what purpose, where to prioritize developing use cases and how to plan for the future.
I will also touch a little bit on threat modeling. For those of you who are adept at this, please bear with me. I’ve dumbed it down quite a bit, but I think it’s important to show how threat modeling can be used in a practical sense for small and medium sized companies.
Welcome to the world of Infernux Corp - tagline “Definetively not evil”. The company, which sells paper, is a medium size company with 100 employees, 3 offices and a datacenter. The company has a mix of on-premises and cloud infrastructure, with a mix of Windows and Linux servers. The company has an internal IT team of 7 people with two of them focused solely on security, and a CISO.
The company has the following “tech stack”:
Application | Description | Hosted |
---|---|---|
ERP | Enterprise Resource Planning system | Cloud |
AD | Active Directory | On-premises |
Orders | Custom built application for managing orders, AD integrated | On-premises |
Supply Chain | Custom built application for managing inventory and supply chain, AD integrated | On-premises |
Website | Company website | Cloud |
Office 365 | Email and productivity suite | Cloud |
Azure | Cloud services | Cloud |
Entra ID | Identity and Access Management system | Cloud |
Entra ID Connect | Connects Entra ID to on-premises AD | Cloud |
HR | Human resources system for managing employees and salary | Cloud |
This is how it looks in terms of the logical flow of information:
graph TD
subgraph "Cloud (Azure)"
CRM[CRM]
ERP[ERP]
Website[Website]
Office365[Microsoft 365]
EntraID[Entra ID]
HR[HR]
EntraID --> |SSO| HR
EntraID --> |SSO| ERP
EntraID --> |SSO| Office365
Office365 --> Outlook
Outlook --> SharedMailbox
end
subgraph On-premises
AD[Active Directory]
OrdersApplication[Custom Orders App]
SupplyChain[Custom Supply Chain App]
AD --> |SSO|OrdersApplication
AD --> |SSO|SupplyChain
OrdersApplication --> SupplyChain
SupplyChain --> OrdersApplication
SupplyChain -.-> |Updates warehouse status| Supply
end
AD --> |Entra ID Connect| EntraID
Orders --> SharedMailbox[Shared Mailbox]
Orders --> Website
Orders -.-> OrdersApplication
Website -.-> |Frontend for| OrdersApplication
Customers --> Orders
SharedMailbox --> |Manual input|OrdersApplication
The gist is that the customer orders either through one of two points of contact:
For each order the following flow is followed:
The infrastructure is as follows:
Server | Description | OS | Role |
---|---|---|---|
DC01 | Domain Controller | Windows Server 2019 | Active Directory |
Orders01 | Orders Application | Windows Server 2019 | Custom Orders Application |
SupplyChain01 | Supply Chain Application | Windows Server 2019 | Custom Supply Chain Application |
Firewall01 | Firewall | Palo Alto | Firewall |
ESXi01 | ESXi Host | VMware ESXi | Virtualization |
WorkstationX | Workstation | Windows 10 | Workstation |
Some more information about the infrastructure:
Service | Description | Type | Role | IP |
---|---|---|---|---|
ERP | Enterprise Resource Planning system integrated with Entra ID using an Enterprise Application | SaaS | Business Application | N/A |
Website | Company website deployed using Azure App Service | PaaS | Frontend | N/A |
Office365 | Email and productivity suite | SaaS | Email and Productivity | N/A |
Azure | Cloud services | N/A | Infrastructure | N/A |
Entra ID | Identity and Access Management system | SaaS | Identity and Access Management | N/A |
Entra ID Connect | Connects Entra ID to on-premises AD | SaaS | Identity and Access Management | N/A |
HR | Human resources system for managing employees and salary integrated with Entra ID using an Enterprise Application | SaaS | HR | N/A |
Some more information about the cloud services:
The company has the following security controls or products in place:
Product | Description | Role |
---|---|---|
Palo Alto | Firewall | Network Security |
Microsoft Sentinel | SIEM | Security Monitoring |
Microsoft 365 licenses | The company has a mix of licenses, mainly Enterprise Mobility + Security E3 for office workers and Microsoft 365 E5 for the IT and Security team | Microsoft licensing |
Defender for Endpoint | Endpoint protection rolled out to all office clients | Endpoint Security |
Defender for Servers | Endpoint protection rolled out to all servers | Endpoint Security |
Defender for Office 365 | Email protection enabled for all users | Email Security |
Defender for Cloud | Cloud security posture management for Azure, not configured | Cloud Security |
Some more information about these controls or products:
So now we get to the fun part. In this case, we assume that we are not able to change the infrastructure or the security controls. We can at most install agents on the servers for logging purposes. Frustrating as that may be, this allows us to focus solely on the data sources that are available to us and how we can use them to monitor the security of the company.
The first order of business (pun intended) is to determine what we want to protect in this scenario. As we have a decent understanding of the business and the infrastructure, we can start by defining the business-critical assets.
Our general understanding of the business needs to be translated into a list of assets to protect. In this case, since we know that we have two ways of receiving orders, we can start by defining the applications that are used to process orders.
Asset | Description | Criticality |
---|---|---|
Orders Application | The custom orders application that is used to process orders | High |
Supply Chain Application | The custom supply chain application that is used to manage inventory and supply chain | High |
Active Directory | The domain controller that is used to manage users and computers and is required for the applications | High |
Office 365 | The email and productivity suite that hosts the shared mailbox used to take orders | High |
Website | The company website that is used to take orders | High |
VPN | The site-to-site VPN that connects the on-premises network to the Azure cloud | High |
This is the list that I came up with. My reasoning is the following:
Now, if we dig even deeper, we see that as long as the on premise infrastructure is up and running, along with either the shared mailbox or website, the company can continue to receive orders. As the workstations in the warehouse are able to communicate with the servers, they can be used to enter orders manually if the VPN is down. In case of emergency, we could update the website to include a phone number and separate email address for orders, which would allow the sales team to take orders over the phone and manually input them into the orders application by using the workstations.
What we have here is the semblance of a business continuity plan. We have identified the critical assets and we have identified a way to continue operations in case of a failure. This is important, as it allows us to focus our monitoring efforts on the critical assets and the critical paths.
At this point, I want to acknowledge for everyone reading that, yes, what this company needs the most is not security monitoring. We don’t know anything about the actual security strategy, BCP, patch management, inventory management, etc. We are just focusing on security monitoring for the sake of this post.
That being said, some things to obviosuly consider based on what we know are:
Usually I would always advise to start with the basics, but in this case, we are focusing on security monitoring. So let’s get back to that.
So now comes the fun part. In an ideal world, we would enable all the data sources and all the use cases, then have a stacked team of analysts to go through the alerts together with some magic automation and AI to help us out. This however, is not the case. Usually most companies are in a situation where they have to prioritize what they do and what they do should probably have good effect for the amount of effort it takes. It’s also the case of cost and resources - logs cost money, and analysts cost money.
To this point, the idea is that if we can determine what is the most important things to us and monitor that, then we are in a good place. If we can create use cases using this data that are high fidelity, then the time spent looking into alerts will be time well spent rather than time wasted. In our case, we have only two people dedicated to security, which in reality means no full time security analysts. So we need to be smart about what we do.
So what are we afraid of?
To accomplish this, let’s do some threat modeling.
“Threat modeling works to identify, communicate, and understand threats and mitigations within the context of protecting something of value.” - OWASP
A threat model typically includes:
graph LR
Assumptions --> Threats
Threats --> Mitigations
Mitigations --> Validation
Assets --> |Threat Model| Assumptions
In laymans terms, it’s a way of thinking about what you want to protect, what you are afraid of, and what you can do to protect it. When I talk to people about what logs to prioritize, we usually start with something that resembles a threat model, but is actually much simpler. It’s a few questions:
When working in security, you will usually never know the systems as well as the people who work with them every day. So it’s important to involve the people who know the systems best in this process, and to ask them these questions. Even if they don’t have the security knowledge or context, they will know what might bring the system down, or what can cause the most damage.
Now, let’s do a simple threat model for a five year old with some toys they want to protect.
An assortment of toys.
Now that we have a general idea of how a threat model works, let’s do one for Infernux Corp.
The business-critical assets of Infernux Corp.
The assumptions are what we know about the company and the infrastructure. We could probably come up with a very long list of assumptions here, but let’s keep it simple. We will limit ourselves to 10 assumptions:
Another assumption for this specific example is that we can’t change any security products, controls or the infrastructure.
The threats are what we are afraid of. This can be anything from a ransomware attack to a disgruntled employee. Depending on the company and industry, the threats can vary greatly. A threat is a potential or actual undesirable event that may be malicious (such as DoS attack) or incidental (failure of a Storage Device). For Infernux Corp, we will limit ourselves to four specific threats:
The mitigations are what we can do to protect against the threats. This can be anything from implementing MFA to segmenting the network. For Infernux Corp, we will limit ourselves to mitigations that are directly related to security monitoring.
Our assumptions will be based in that Microsoft Sentinel (or Defender XDR unified portal) is the tool we will use to handle incidents. Enabling the right data sources and creating the right use cases will be our main focus. As such, the idea to enable the Defender XDR unified portal and configure it after best practices is a good idea, but I will not mention it here.
So, given what we know currently, we can come up with the following mitigations:
Threat | Mitigation | Reasoning |
---|---|---|
Ransomware attack on the servers | We can enable logging for security events on the workstations and send them to Microsoft Sentinel. Servers have EDR agents installed, so given our timeframe and resources we consider them protected. |
This will allow us to monitor servers and workstations for any suspicious activity. |
Ransomware attack on the servers | Generate near-real-time (NRT) use cases that check for patterns commmon to ransomware attacks. | This will allow us to monitor the servers and workstations for any suspicious activity. |
Unauthorized access to the custom applications | Enable logging on the domain controller for Active Directory logs and send them to Microsoft Sentinel | This will allow us to monitor the custom applications for any suspicious activity. |
Unauthorized access to the domain controller | Send logs from Active Diretory to Microsoft Sentinel. The DC has the EDR agent installed. |
This will allow us to monitor the domain controller for any suspicious activity. |
Unauthorized access to the domain controller | Monitor highly privileged user activity. Monitor creation of new users (in this context, it should rarely happen). Monitor privileged role assignment. Monitor creation and modification of Group Policies. Monitor login attemps (unusual signins, failed attemps). Monitor account lockouts. |
This will allow us to monitor the domain controller for any suspicious activity. |
DOS attack on the website | Enable diagnostic logging on the Azure App service and send it to Microsoft Sentinel | This will allow us to monitor any errors, performance or security issues in the service |
DOS attack on the website | Monitor failed authentications to the website. Monitor number of requests. If we could, we would enable DDOS protection, use a web application firewall and in general follow the guidance in the Azure security baseline for App Service. |
This will allow us to monitor the website for any suspicious activity. |
Some other quick wins that require little effort and can be done quickly are:
Normally we would probably set up log servers (syslog and a windows event collector) to collect logs from all the servers and workstations. We could then place this server somewhere in DMZ, or with access to Azure via private link, but given the constraints of this scenario, we will limit ourselves to installing agents and logging directly to Microsoft Sentinel.
The validation is how we can check if our mitigations are working. This can be anything from checking logs to running a red team exercise. For Infernux Corp, we will limit ourselves to checking logs in Microsoft Sentinel and verifying that the use cases are working as intended. I wrote a bit more in detail about this in another post.
My validation would be to check the following:
We have a general idea of what to do next. Implement the mitigations by ingesting logs into Microsoft Sentinel and creating use cases. We have a general idea of what to validate and how to do it.
From here we can start to think about what we want to do next. In order to make this a bit more clear, let’s assume some things:
Given these assumptions, we can start to think about what we want to do next. We will keep it strictly to security monitoring, as that is the focus of this post.
The first thing we want to do is to tune and “perfect” the use cases. This means that we want to make sure that the use cases are working as intended and that they are not generating too many false positives. This can be done by running tests and tweaking the use cases until they are working as intended.
Using the data sources we have enabled, we can start to create new use cases. This can be anything from monitoring for unusual activity on the servers to monitoring for unusual activity in the cloud. The key here is to focus on the critical assets and the critical paths, while maintaining high quality that assures that the alerts are actionable.
Once we have the use cases in place, we can start to monitor the alerts. This means that we want to make sure that the alerts are being generated and that they are being investigated. This can be done by setting up a process for handling alerts and making sure that the alerts are being investigated in a timely manner.
We need to be able to continuously improve our use cases. This means having some sort of feedback loop where we can take input on a use case and use it to improve the use case. As this is a small team, it’s likely easy to do this informally, but as the team grows, it’s important to have a more formal process in place. Another thing to consider is if the infrastructure changes, we need to be able to adapt our use cases to the new infrastructure.
Assuming we have implemented some use cases, if we wanted to either add new ones or improve the current ones - we would do something like this:
graph LR
Detection --> A[Generates alerts]
A --> R[Response]
R --> D[Validation]
P[Identified threat] --> |Mitigated by| Detection
D --> |Improve or create more| Detection
The idea is that based on threat modeling and what we are afraid of, we can generate detection. By responding to these alerts, we gain some insight into wether or not it’s a good detection in it’s current form, or if it needs improvement in the form of tuning. This in turn becomes our validation of sorts, which informs the detection process further. If the detection is in an alright state, we can move on to the next threat.
As with the above paragraph on feedback loops for use cases, we also need to have some sort of threat modeling lifecycle in place. We obviously might not have the time, people or resources to fully threat model everything - luckily we get some pretty decent tools from OWASP in form of the four question framework:
A possible threat exists when the combined likelihood of the threat occurring and impact it would have on the organization create a significant risk. The following four question framework can help to organize threat modeling:
There are many methods or techniques that can be used to answer each of these questions. There is no “right” way to evaluate the search space of possible threats, but structured models exist in order to help make the process more efficient. Attempting to evaluate all the possible combinations of threat agent, attack, vulnerability, and impact is often a waste of time and effort. It is helpful to refine the search space in order to determine which possible threats to focus on.
We can visualzie the threat modeling lifecycle like this:
graph TD
System --> Component
Change
System -...-> Change
Component -.-> Change
Scope ----> |Treshold| Change
subgraph "Threat Modeling"
Q1[What are we working on?]
Q2[What can go wrong?]
Q3[What are we going to do?]
end
Change --> Q1
Q1 --> Q2
Q2 --> Q3
Q3 --> Outcome
Outcome --> Q4[Did we do a good job?]
This indicates that we probably would do well with some way of tracking changes - in the form of a change management process. This will also inform our threat modeling process, as we can then focus on the changes that are happening in the infrastructure and the business that exceed our determined scope. Not every change needs to be fully modeled, but we need to have a way of determining what changes are important to model.
We are finally at the point where we can change the infrastructure. It’s been decided to make some changes to the workstations in the warehouse and to segment the network:
graph TD
subgraph "Cloud (Azure)"
EntraID[Entra ID]
end
subgraph On-premises
subgraph "Server zone"
AD[Active Directory]
OrdersApplication[Custom Orders App]
SupplyChain[Custom Supply Chain App]
AD --> OrdersApplication
AD --> SupplyChain
OrdersApplication --> SupplyChain
SupplyChain --> OrdersApplication
SupplyChain -.-> |Updates warehouse status| Supply
end
Workstation
end
AD --> |Entra ID Connect| EntraID
EntraID -->|Cloud identity| Workstation
WarehouseWorkers -.-> EntraID
Workstation -.-> |App only access|OrdersApplication
Workstation -.-> |App only access|SupplyChain
The changes can be summarized as follows:
Given these changes, we can now apply the four question framework to Infernux Corp:
It’s important to remember that security monitoring is not a one-size-fits-all solution. It’s important to tailor your security monitoring to your specific business needs and infrastructure. By determining what data sources to use for what purpose, where to prioritize developing use cases and how to plan for the future, you can ensure that your security monitoring is effective and efficient.
I firmly believe that security monitoring is something that not everyone needs. In this case, I pass the question to the reader: Does Infernux Corp need security monitoring? They obviously need a lot of other stuff, as I’ve mentioned - but does security monitoring make sense for them?
I will at least say that there’s a lot of things that should be fixed and improved before focusing on security monitoring. Generally, security monitoring is something that demands maturity in other areas to form a solid foundation. Security monitoring built upon a good foundation by a team with intimate knowledge of what “normal” looks like will be able to create high fidelity, actionable use cases with low amounts of false positives.
This includes things like actually making good use of the network logs, having a good understanding of what data is allowed to travel where, and having a good understanding of what the normal behavior of the network looks like. Also understanding which logs belongs in a SIEM and which logs are to be used for IR and compliance and can be sent to a cheaper storage solution. When to use what log tier, like utilizing auxiliary logs in Microsoft Sentinel vs the standard analytics tier.
Looking deeper into the logs, we can even start to filter the logs for irrelevant data, thus being able to ingest more (cutting cost) and still have a high fidelity in our use cases. Either working directly with the syslog or event collector servers, or using data engineering pipelines, wether that be something like workspace transformation data collection rules in Sentinel or tools like Cribl Stream and Tenzir.
In the end, security monitoring is a tool that can be used to help protect your business, but it’s not a end all be all solution. It requires a lot of work to get right and it requires a lot of work to maintain. It’s not something that you can just set and forget, it’s something that requires constant attention and tuning.
Hopefully this blog post gives some insight into how to approach security monitoring and how to tailor it to your specific business needs and infrastructure. Making sure that you know what you want to protect, what you are afraid of, and what you can do to protect it is key to creating an effective and efficient security monitoring program that takes into account avaialble resources and constraints, like people and money 💰.