With TAXII 2.1 release it’s time to check what this new version is bringing. TAXII, or Trusted Automated Exchange of Intelligence Information, is an intelligence exchange protocol over HTTPS. To get more information about STIX and TAXII don’t hesitate to check our previous blog post about it: Introduction to STIX and TAXII
In this article we are going to talk about the main issue we found in TAXII 2.0: the pagination.
TAXII 2.0 Filters and Pagination
In the version 2.0 of TAXII the pagination was achieved using the Range and Content-Range headers. When a client asks for specific items the range header has to be specified stating the range needed. For example if the client wants to get the items located between position 10 and 49 it would need the header Range: items 10–49 in the request.
The server then has to answer to the query and provide information about the returned items along with the total number of items. i.e. Content-Range: items 10–49/500 where 10 is the first item in the response, 49 the last one and 500 the total number of item available for this query.
This way of navigating the results is call an offset based pagination. It is easy to use and to implement on relatively small data sets. Unfortunately as we get more and more results this way of paginating the results becomes highly inefficient. This issue comes from two main areas, counting the results and using high offsets.
Counting the results means that the query performed in the database will need to go over all the results to get the exact number of results matching the query sent by the client.
It may not seem like a big deal when no filters are applied and the dataset is small, but we have to keep in mind that TAXII offers many way to filter data resulting sometimes in complex database queries. Besides, TAXII servers are often leveraged to share huge amounts of Cyber Threat Intelligences
How TAXII 2.1 solves the issues
TAXII 2.1 replaces the offset based pagination in the HTTP headers to use a cursor based one.
The request can now use two query parameters:
- limit: to specify the number of results to get
- next: to specify the cursor to fetch the items from
The response will contain two specific attributes in its body:
- more: whether there are more results to fetch or not
- next: the cursor to get the next batch of items
TAXII 2.1 is not specifying how the cursor must be implemented, it states:
This value is opaque to the client and represents something that the server knows how to deal with and process.
It lets the server implementation choose how they want to set this value depending on their database. For example a company using Elasticsearch could take profit of the scrolling API and use the Scroll id has the next parameter.
The cursor pagination at Sekoia.io
At Sekoia.io our TAXII server sits on top of an ArangoDB database (see: Threat Intelligence data storage: make it easy with ArangoDB!). To design an efficient cursor we decided to create a custom cursor relying on 2 attributes: created and id.
The cursor is the base64 version of {created}|{id}. For example an object with the identifier indicator–33fe3b22-0201-47cf-85d0-97c02164528d created on 2017–04–14T13:07:49.812Z would give MjAxNy0wNC0xNFQxMzowNzo0OS44MTJafGluZGljYXRvci0tMzNmZTNiMjItMDIwMS00N2NmLTg1ZDAtOTdjMDIxNjQ1Mjhk (base64(2017–04–14T13:07:49.812Z|indicator–33fe3b22-0201-47cf-85d0-97c02164528d)).
Including the id in the cursor avoids the client to receive multiple times the same item when several objects have the same creation date in the db. When we retrieve the items from the database, the results are sorted by ascending created and then by ascending id.
If the client provides a cursor we filter data to only get documents matching one of the following criteria:
- Items with the same date as the given one and a bigger id
- Items with a bigger date than the given one
A query when a cursor is provided would look as follow:
To improve performances the queries relies on 3 indices, one for each of the cursor fields and a compound index improving the sort. Using this cursor based pagination instead of a pagination based one improved the performances by more than 10 times for the requests that needed a big offset before.
Conclusion
The oasis committee behind TAXII is listening to the remarks made by the users of TAXII and is updating its specification to solve the issues standing in the way of a good adoption of the standard. TAXII 2.1 is a step in the right direction and at Sekoia.io we look forward having more and more clients consuming our TAXII 2.1 endpoint.
You can found the latest TAXII specification at the following address: https://docs.oasis-open.org/cti/taxii/v2.1/cs01/taxii-v2.1-cs01.html
On our blog, you can read also:
- Hatching Triage to enhance Sekoia.io Cyber Threat Intelligence
- Detail of an alert, observable database, new exclusive source … the novelties of October 2021
- An insider insights into Conti operations – Part Two
- Ideation process at Sekoia.io
- EternityTeam: a new prominent threat group on underground forums
- Calisto show interests into entities involved in Ukraine war support