Table of contents
In this blogpost we present Sekoia.io’s process to create detection rules, which first requires explaining our detection workflow as well as understanding Sekoia.io XDR history and specificities.
Below is a timeline of Sekoia.io XDR history. Although quite recent, it shows that several features were added over time, which impacted how we create rules.
Figure 1. Sekoia.io XDR rules-related timeline
It’s important to note a few specificities related to Sekoia.io XDR platform:
- It is vendor-agnostic, which means logs are integrated from different sources (including Windows+SYSMON, EDRs agents, Firewalls, Proxies, AzureAD, and AWS), and we build our own parsers based on source events. That logic is extended to our rules that don’t depend on the log source.
- Sekoia.io XDR also allows customers and partners to create their own parsers and their own detection rules. Therefore we need to ease up their creation.
- Our XDR displays the logs but also the alerts raised and rules we create (+500 rules in our catalogue at the time of writing) to customers and partners. Therefore when a rule is pushed to production it impacts everyone, not just us.
This last specificity is a critical and complex issue for SIEM vendors. In the past, this was often solved by the Managed Security Service Provider (MSSPs) who had to review each rule. This was a considerable waste of time and value – while the SIEM technology was useful, the rules shipped with it, not so much.
At Sekoia.io we advocate for higher quality and believe we actually provide that! Therefore, our detection rules display an effort notation system, indicating the Effort Level needed for a rule to be activated in production.
Effort Level could be either an effort to implement the right logs for the rules to work, an effort in dealing with false positives or adapting the rule to the corporate environment. While the notion of effort is good-to-know for our customers it is not sufficient, hence why there is now a strict process to create detection rules and a strict field normalisation when parsing the logs.
Field Normalisation: Welcome ECS
The first condition to build good detection for Sekoia.io vendor-agnostic XDR is field normalisation. Without that, we would not be able to build detection rules for all the parsed log sources (called “Intakes”) that could be ingested by our platform.
Sekoia.io started its production development on the Threat Intelligence pillar. That’s why it made sense to use STIX for both our CTI and XDR products. With the same logic, our detection engine also used STIX for field normalisation, hence rules were in STIX Patterning language. However, facing limitations with STIX overall for detection, we moved from STIX to Elastic Common Schema (ECS). Likewise, we took advantage of having incorporated ECS as our standardisation strategy to migrate our detection engine from STIX Patterning to Sigma and increase its adoption by analysts.
Our SIGMA implementation focuses on the detection pattern (i.e. not on the whole rules format). However as many different intakes are integrated in the platform, the fields used in this detection pattern are mainly in the ECS format but can be customised ones. While this somehow differs from the original implementation, the logic remains the same.
Figure 2. Sigma rule from SigmaHQ versus Sekoia.io
Although field normalisation is of great interest to increase the efficiency of our detection rules catalogue, unfortunately it is not bullet-proof. Some intakes are too specific, such as EDRs raising events in different ways or Cloud-related intakes overall. Moreover, we tend to not normalise values and still have to do specific rules for some intakes.
To illustrate the above, below is an extract of parsed log, from HarfangLab EDR:
Figure 3. Custom fields for HarfangLab EDR
Several informations are provided in that event and some keys in this JSON log event are easily transcribed in ECS fields. However some are quite specific to HarfangLab, such as “execution”:0 which defines the action done by HarfangLab EDR for this event (as it triggered an alert on the EDR side in that case). Here, we have to define a specific field not known in ECS standard, unfortunately, and is named simply “harfanglab.execution” in Sekoia.io XDR.
Here is a quick schema summarising our global workflow displaying how detection is applied to ingested logs:
Figure 4. Logs ingestion into Sekoia.io XDR
Once the first step is completed, field normalisation works and detection rules can usually use these fields), saving time while preserving the value and quality.
Now that we can easily create detection rules, let’s talk about our process to create them and its different steps, such as how we make sure they work in production and how we try to avoid false positives.
Rule syntax checking
The second step is quite straightforward: the detection rules need to respect a specific syntax (both metadata and the rule logic) required by our detection engine and platform.
Therefore we implemented a rule linter integrated with a GitHub action.
As rules are built under YAML format, the linter first checks if the YAML is correct, and then checks the metadata to make sure mandatory fields are filled up. Each rule is associated with an UUID for our detection engine and it also checks if there are no duplicate UUIDs.
Eventually, the linter checks if the actual detection part of the rule respects the syntax – value modifiers, regexps, as well as the fields used in the rule to make sure they exist in our platform and therefore in the received logs.
However, this linter is “dumb”, and lacks some logic checks: when using an AND instead of an OR in a rule, it won’t detect that the rule logic is incorrect. This is the reason why detection logic rule checking matters.
Rule logic checking
To implement detection logic checking within our rules creation process, two features were implemented:
- Test samples
- Staging rules
For the time being, there is no “continued attack replay” integration checking our rules. There are basically three choices for implementing that:
- pay for an existing one with sets of attacks pre-configured,
- build your own solution from scratch,
- pay for a solution that allows you to build your own attacks.
The first choice is great to save analyst time, however it is obviously not exhaustive and not specific to match all our rules, it just usually tests common attacks / TTPs.
The second choice is great to be exhaustive, however it requires lots of analyst time to build the solution and then build the attacks but also to update attacks according to each detection rule update.
The last choice is a great combo but the analysts still need to build the attacks in the solution, and update each one of them according to the rules.
This kind of solution could be great but requires long-term efforts and time for little value compared to test samples, hence we made the choice not to have a “continued attack replay” solution.
Therefore we chose to use test samples, which are events (logs) that matched or did not match a rule. Meaning that when test samples match a rule, it verifies the rule works, whereas when they don’t match a rule, the test event checks the false positive filter. If an analyst changes the rule in the future, test samples will be there to keep the initial rule detection logic. That allows to avoid unwanted alerts in production and regression.
These test samples are in JSON format and can be replayed in our platform later on, making it really powerful when the rule is reviewed years later for instance, or if someone changes the rule to be sure the rule still matches (or does not match) the events it was originally designed to detect (or not detect). The checks on the test samples are also automatically done using a script integrated in the GitHub workflow.
To summarise: test samples combined with our script verify the detection logic, ease up rule review and prevent regression.
However these test samples do not check for potential false positives over time. Hunting in our dataset for false positives is great but not enough as the syntax and logic differs from the rules, or it is sometimes simply not possible to reproduce the same logic. This is where the staging rules enter the stage!
Staging rules (also known as “silent rules”) are rules in testing mode that will not trigger an alert to our customers, but will allow us to see how many alerts it would have triggered in production, and on which events. Then, it is easy to check if the events are false positives or not.
Figure 5. Sekoia.io XDR rules process
Cool cool cool but… Is that enough? Hell no!
“You should have understood it by now, our main issue here is to correctly check for false positives so we don’t raise too many alerts of low quality and don’t lead our customers’ analysts to fatigue, alerting apathy or to them making tickets to us hence requiring additional engineering time and resources.”
Rings a bell?
It might remind you of an awesome framework, the Alert and Detection Strategies Framework (ADS)!
As you noticed, our issue—avoiding alert fatigue and low quality alerts—was exactly the same, making this framework just perfect.
It mainly allows to:
- Keep track of every detection rule and their logic with actual words.
- Build knowledge and intelligence for each rule: why it was created, what were the blind spots, can it be improved?
- Create validation steps to trigger the rule again in the future if needed. In this section, a Sekoia.io valid alert in a testing community is mandatory to push the rule to production.
- Have a place where the false positives are checked explicitly with words on the filter logic done by the analyst (why this specific process was excluded from the rule, etc.).
Our process is simple: each rule must have an associated ADS file or it won’t be pushed to production.
Rules creation & rules review
Rule creation in Sekoia.io follows a specific process:
Figure 6. Rule creation process in Sekoia.io
It is not an “easy” process as each rule creation is time consuming yet necessary to ensure high quality in our product. Furthermore, since unexpected behaviours and false positives can still remain despite this process, it certainly saves time on the long-run to follow a strict process vs. having to “quick (and sometimes dirty) fix” the rules.
To test and improve this process, we started a long, very long, task last year: reviewing all rules in our catalog. First goal was to translate them from STIX to SIGMA using ECS fields but also to make sure they are relevant, work and won’t cause false positives.
While we are very proud with all the work done these past years and the explained-processes that upgrade our product quality, we are aware that it can still be improved. The next steps would be to improve the detection rule creation process (to be more strict to avoid false positives and surprises). For that purpose, we are planning on forcing rules to be in staging for some time (probably two weeks), before being released into production, while still keeping the capability of pushing rules fast if needed.
Then you might have noticed that we do not check the mappings between our rules and the intakes created fields (parsed fields) automatically. This means that a parser modification could lead later to a bad detection. We will soon implement an automatic check that will run our “test samples” every day with the different intakes to be sure we won’t miss a modification that would affect our rules.
Finally, we would like to allow our partners and customers to directly participate in rule creation (and rule lifecycle more generally) through a public GitHub repository, where each new request will need to have a mandatory review by a Sekoia.io analyst before being merged and integrated in Sekoia.io XDR. That would allow to improve the number of community rules, improve rules quality overall and highly reduce false positives for everyone easily. Furthermore, it’s also in our DNA to share and promote transparency whenever we can, and this will take us one step further.
Thanks for reading this blog post. You can also view the following content: