How DeNexus Made Cyber Risk Simulations 50x Faster

Written by Mario Pio Gioiosa | Oct 3, 2023 2:50:25 AM

DeNexus' flagship Cyber Risk Quantification and Management platform, DeRISK is made of several component modules, one of these is called Attack Propagation Algorithm -APA-, powered by Inside-Out and Outside-In data. In this blog post, we aim to disclose how and why we have optimized the performance of APA in DeRISK v5.

What is an Attack Propagation Algorithm (APA)?

APA is the module that estimates the probability of an Attack Attempt being successful in causing a Loss Event. APA uses a propagation algorithm over a directed graph, in which the nodes represent the steps of an attack to progress from an Initial Access Vector to a successful Impact or to be stopped by the presence of a Cyber Control; and the edges represent the probabilities of moving along these steps. The APA graph represents the Attack Paths that can cause an impact on a Single Facility or single Unit Risk using a combination of given MITRE ATT&CK tactics and techniques.

Also, Unit Risks might communicate externally, for example with a control center, or even with other facilities. When that happens, the attacker can exploit this communication to perform a “lateral move” from one Unit to adjacent Units and vice versa. DeRISK captures these dependencies leveraging Inside-out data in the DeNexus Knowledge Center in our APA graph, by creating edges connecting these entities (facilities, control centers, etc).

As you would imagine, the more complex the connection and the more facilities there are, the larger the graph and the possible paths to reach impact.

The events that DeRISK is modeling are quite rare, so the probability of succeeding is usually quite low. This means that the Monte Carlo simulation in DeRISK may require millions of iterations to ensure that we reach the required accuracy level.

APA before DeRISK v5.4

Implementation of APA in previous versions of DeRISK worked well when it had to run the simulations for a single Unit isolated or for a relatively small group of Units. But as DeNexus’ customer base grew, and the portfolios of our customers grew in size and number of Units, the graph necessary to calculate bottom-up Cyber Risk Aggregation became exponentially bigger and we observed a progressive degradation in the performance that required the implementation of new solutions.

To be specific, we observed the following issues:

The execution time was not linearly growing with the number of Units, but slightly exponentially.
The memory needs were constantly and linearly growing during the execution causing a hard limit of Units/iterations that could be processed before hitting an out-of-memory error on the cluster.

The new specifications were set:

Being able to quantify Cyber Risk for very large portfolios (1,000+ Units) using DeNexus’ proprietary data-driven approach to Risk Aggregation.
Increase 20x the number of iterations to improve accuracy and stability across different runs.
Add deeper granularity in APA subdividing Units in Zones - a grouping of systems and components based on their functional, logical, and physical relationship that share common security requirements-inspired by the ISA/IEC 62443 series of Standards.

Or, in simpler terms: support a larger and much more complex graph with millions of iterations and optimize its performance.

After careful analysis and multiple tests, we concluded that simply optimizing the existing code did not meet the new requirements, so the decision was made to re-architect and refactor APA in a scalable and performant way that can support current and future needs:

First, we wore the refactoring hat and we introduced an abstraction layer to allow new and former APA implementation to easily coexist until the new replaces the former. This helped us to be able to constantly merge the code in the main branch, avoiding long-living branch, and to easily switch between the two versions to get immediate feedback on how the new was behaving during our continuous QA tests.
Second step involved rewriting the algorithm while keeping DeNexus’ proprietary characteristics of the DeRISK Cyber Risk Quantification and Management platform, this time using a low-level high-performance language -Rust- instead of the previous high-level language -Python-. An approach that we already used successfully in the past for other parts of the platform. We developed new algorithms in Rust and used PyO3 to create a native Python extension module so that the Python code could call the exposed Rust function. We used simplified property-based testing to ensure that for the same input, the output of the former APA and new APA were consistent.
Once the new APA could safely replace the former APA, after all the QA process was completed, we moved to the last step: performance optimization. We introduced a step of graph contraction before the algorithm execution, to optimize graph dimension and we tuned Rayon, the library that we used to parallelize the iterations run.

The results

We achieved exceeded initial requirements and were truly gratifying, both in terms of performance and scalability. In terms of performance, we reached ~100M of iterations per second on small graphs (< 5 Units) and ~25M of iterations per second on medium graphs (>60 Units). Measures have been taken on a cluster with 32 vCPU and assuming a range of Units with medium to high-security maturity postures.

Comparing the new APA against the former one we have ~50x of performance improvement that enables DeRISK to run 40 million simulations each time to solve a constrained global optimization problem over the mitigations of a given cybersecurity framework, enabling DeNexus’ proprietary bottom-up data-driven approach for very large portfolios. A robust and reliable foundation for future additional developments.

We also solved the scalability problem

Memory usage stays nearly constant during execution and there is negligible impact on memory when the graph size or the number of iterations increases.

In summary, a robust and reliable foundation for future additional developments, including the support for additional industry verticals that we will be unfolding in future blog posts.

Click Here to learn more about DeNexus Inc.'s comprehensive Cyber Risk Quantification and Management platform.

Click Here to read more about the DeNexus Knowledge Center and the DeNexus Trusted EcoSystem.

View full post