The Perils of Black Box Algorithms in Cyber Security

Posted by Madeline Lee   |   August 2, 2018

By Pamela Bleve, Data Scientist and Machine Learning Engineer at CounterTack

 cyber security


In the current cybersecurity climate, Machine Learning has been quickly adopted for its potential to automate the detection and prevention of attacks, to identify novel, zero-day cyber attacks and targeted malware. ML and Statistical methods offer promising solutions to these cyber attacks faced by SOC analysts. However, many statistical methods are black box, and their opacity makes it difficult for the analyst to infer any actionable solution from the methods’ suggestions.

The infosec industry is accustomed to rules, blacklisting, fingerprints and indicators of compromise so explaining why an alert triggered is the natural next step. In contrast, Machine Learning models are often dismissed on the grounds of lack of interpretability. Simple linear statistical models such as logistic regression yield to interpretable models. On the other hand, advanced models such a random forest or deep neural networks are black boxes, meaning it is nearly impossible to understand how a model is making a prediction.

cyber security

Why interpretability?

Conventional evaluation metrics such as accuracy or AUC offer little assurance that a model and via decision theory, its actions, behave acceptably. In a critical area like our real-world cybersecurity scenario, there are usually factors to optimize for other than precision, recall or accuracy. Thus demands for fairness often lead to demands for interpretable models.

During the model construction process, the most important choices to be made are:

  • Computational complexity
  • Mathematical complexity (linear or non-linear boundaries)
  • Explainability
    • A human might need to read the output of the model and determine why it made the decision it did. For example, if you block a legitimate user, you should educate them as to what they did that was suspicious. The best model for family explainability is a decision tree, because it says exactly why the classification was made. We can use relative feature weights to explain logistic regression and Naïve Bayes. Decision forests, SVMs, and neural network are very difficult to explain.
    • Models which can support modes of interaction with people have several advantages. They can receive expert feedback more readily, which can be used for improving both labels and features, and allowing the model to improve in otherwise difficult ways. Human confidence and trust in the model can be made more quickly when there is some way of understanding how the model decisions are made. Having methods for exploring the model can also help to validate the underlying data.


Options from research community

Since cyber analysts have traditionally had a lack of trust in statistical solutions as they can lead to nonsensical false positive, big efforts are being invested in finding ways to explain the output of machine learning models.

The following model properties and techniques can confer interpretability and identifying transparency to humans:

  1. LIME
  2. Meta-algorithm
  3. Post-hoc interpretability
    • Text-Explanations
    • Virtualization


LIME stands for Local Interpretable Model-agnostic Explanations, and its objective is to explain the result from any classifier so that a human can understand individual predictions. The LIME algorithm approximates the underlying model with an interpretable one. This is done by learning from alarms from the original example and training a sparse linear model in the nearest neighborhood around the target instance.

Black-box- LIME

The key to LIME’s effectiveness is the ‘local element’. That is, it doesn’t attempt to explain all the decisions a network might have across all possible inputs, only the factors in determining its classification for one particular input. Looking locally, we can extract linear explanations.

Using advanced methods such as LIME, we can gain a better understanding of how a machine learning model works. It’s an effective way of assessing trust.

2. Cyber Security Meta-algorithm

The meta-algorithm integrates modern, robust machine learning techniques into an interpretable system by constraining the statistical models used. Instead of using a single, complicated model that incorporates all features of the data to answer a single decision problem, the algorithm generates many small, interpretable models and then uses ensemble techniques to combine them.

The adaptable ensembles of interpretive models can show the benefits also in the cyber security domain. An example implementation could be the problem of anomaly detection in network traffic data. The goal of this system is to help the SOC analyst identify interesting data that warrant further investigation.

From an initial network traffic data with N dimensions, the data is partitioned into 1-and-2-dimension subspaces, and for each subspace, several types of machine learning models are trained. These models are chosen with two priorities:

  • Orthogonality (they should cover different qualities of the data)
  • Interpretability (they should have some interpretable outputs)

Then, techniques from ensemble learning are utilized to integrate these models into a single anomaly detector. The final system can offer suggestions of anomalous data by cueing the user to which model contributed the most to its anomaly score.

The cyber analyst is then able to provide feedback on the suggested anomalies. If the analyst responds that the given model was helpful or unhelpful, an importance weighting of each model’s anomaly score is adjusted for future suggestions. The algorithm utilizes a multiplicative weight update that rewards models that receive positive feedback and penalize those that produce inaccurate results. In this way, the system can adapt weightings on feature subspaces and adapt to produce more accurate suggestions to an analyst.

3. Post-hoc interpretability

The post-hoc explanations (i.e., what else can the model tell me?), is a category of techniques and model properties that are proposed either to enable or to comprise interpretations.

Post-hoc interpretability presents a distinct approach to extracting information from learned models. It can interpret opaque models after-the-fact, without sacrificing predictive performance. Some common approaches to post-hoc interpretations include natural language explanations and visualization of learned representations or models.

3.1 Text Explanations

Humans often justify decisions verbally. Similarly, we might train one model to generate predictions and a separate model, such as a recurrent neural network language model, to generate an explanation.

They train another model to map a model’s state representation onto verbal explanations of strategy.

3.2 Visualization

Recently, there has been interest in how visual analytics can be incorporated with machine-based approaches, to alleviate the data analytics challenge of anomaly detection and to support human reasoning through visual interactive interfaces. Furthermore, by combining visual analytics and active machine learning, there is potential capability for the analysts to impart their domain expert knowledge back to the system, so as to iteratively improve the machine-based decisions based on the human analyst preferences. With this combined human-machine approach to decision-making about potential threats, the system can begin to more accurately capture human rationale for the decision process and reduce the false positives that are flagged by the system.


While reflecting on the intersection of data science and cybersecurity platforms, the primary factor to consider is human interaction: How do analysts understand the model decisions and provide feedback? How are models overseen and monitored?

As described above, the benefits of building models are increasing interpretability for domain experts to utilize automated models and developing trust in them.

Topics: cybersecurity, endpoint security, CounterTack, blogs, data, black box, algorithms, machine learning, LIME

Subscribe to Email Updates

Recent Posts

Posts by Topic

see all