To prevail in a situation of increasing incident numbers and sophistication of cyberattacks, it is important to share detailed information about current attacks. As detailed in a previous blog post , this task is performed by CONCORDIA’s Threat Intelligence Platform consisting of the Incident Clearing House (ICH), a central MISP instance , and the DDoS Clearing House. Their primary function is sharing of Indicators of Compromise (IoC) and reporting of security incidents. However, the quantity and explanatory power of the shared data can also contribute to the awareness of the global threat level on the Internet. For example, an emerging new threat or an increased activity related to a widespread threat such as the Emotet botnet often leads to a rapid increase in the numbers of security incidents and IoCs processed by these platforms. An early detection of such activity is crucial to facilitate an effective threat mitigation. In the following, we demonstrate how a statistical approach from time series analysis can be applied to detect such an anomalous increase.
A well-established approach for statistical analysis of time series is provided by Autoregressive Integrated Moving Average (ARIMA) models. They are widely researched and applicable to a wide range of time series data. In particular, ARIMA models allow to:
An anomaly is diagnosed when a new measured value is outside of the confidence interval. Even if anomalies can be manually spotted in the time series graph, an automated approach as provided by ARIMA is advantageous when the quantity of data becomes considerably large. For data volumes typically ingested by threat intelligence platforms, manual inspection may even become unfeasible. Furthermore, ARIMA models provide a solid mathematical measure of the certainty that an unusual value is an anomaly.
To illustrate the ARIMA analysis, we applied it to a time series of attacks (sources per day between Dec 20, 2020 and Jan 30, 2021) against port tcp/80 of the Internet Storm Center / DShield sensor network. This data is publicly available  and can be either used to reproduce our results or to compare ARIMA with other methods or approaches. The results of the ARIMA analysis are presented in Fig. 1. The recorded attack numbers are shown by a red line, the forecasted values are represented by a green dashed line, and the predicted boundaries of the 95% confidence interval are marked by dotted orange and blue lines. The exact model used for this analysis is ARIMA(2,1,0). It can been seen, that two anomalies (vertical blue lines) have been detected. The first is caused by a significant level change in the attack rates (Jan 12, 2021) and the second is an anomalous high number of attacks on Jan 27, 2021.
Ongoing efforts are spent into evaluating the applicability of the ARIMA time series analysis to the data of the CONCORDIA Threat Intelligence Platform. A prototypical implementation is already available for the data of the Incident Clearing House (ICH). The preliminary results met our expectations very well and we are confident that this work will enable us to strengthen the situational awareness capabilities of the CONCORDIA project. We also expect that the detailed data of CONCORDIA’s Threat Intelligence Platform will allow us to spot the cause of the anomaly (e.g. malware activity) which was not feasible for the DShield data in our example.
Further details on the approach itself as well as the field of application are provided in our research paper which is available (open access) at .
(By Jan Kohlrausch and Christian Keil, DFN-CERT)