Machine Learning: Challenges of ML in Cybersecurity
Updated: Aug 8, 2019
~Co-authored by Charles Plummer and Suman Kukreti
The application of Machine learning (ML) has seen rapid growth with the rise of big data and increasing availability of compute resources. ML describes the process of developing algorithms through a process of learning from data. Unlike Artificial General Intelligence, machine learning systems are designed to accomplish a limited set of tasks as opposed to exhibiting cognition on the order of human intelligence. Automated solutions, driven by ML, are continuously showing improvements in functionality and accuracy. From facial recognition to Natural Language Processing (NLP) driven applications these systems are shaping the way people live and work. Against this backdrop it is no surprise to see ML making its way into the cybersecurity realm as a response to more frequent and increasingly sophisticated cyber attacks. However, deploying ML cybersecurity solutions is far from simple and requires understanding how to deal with unique risks and issues that arise from adding ML to your cybersecurity toolset.
As the sheer number of cyber attacks continues to mount, ML driven cybersecurity products and services from a growing cadre of providers have begun to proliferate. Marketed as scalable, data-driven, and automated these “AI” cybersecurity solutions are making their way into countless organizations. Cybersecurity applications of ML include network monitoring, endpoint protection, and code review. ML driven solutions allow cybersecurity professionals to respond more quickly to emergent threats and may provide the ability to stymie these attacks efficiently.
Currently, most ML in cybersecurity works by using network and other curated data to train algorithms in order to recognize future patterns, aka supervised learning, in data of the same type. For instance, a machine learning algorithm that tracks server traffic can be trained to discern typical traffic patterns from those that indicate an attack with fewer false positives than rules based solutions. Once deployed as a monitoring tool the ML cybersecurity algorithm can alert human IT professionals when issues are detected. In this example, and many others, ML augments human performance by reducing the time analysts spend chasing false positives, it doesn't replace the human element. However, deploying ML models to detect threats is not without risk. Below are some examples of the challenges cyber professionals face when turning to ML.
ML solutions are only as good as the data used to train the models that drive them. Poorly cleaned or mislabeled training data can introduce bias into models that leave the door open to undetected malicious activity. Assuming the data used to train an ML model is selected and handled correctly it could be compromised by hackers. In this case the data itself will provide a backdoor through the ML defenses trained on it. Finally, hackers could analyze the ML model directly to extract the features used to flag malicious activity or malware. Once done, savvy attackers can design their exploits to avoid these now known triggers that would flag them as threats.
Beyond the training of models, automation of tasks comes with some inherent risks. As noted above there are several ways that models can be compromised and no model is completely effective. With this in mind it is critical to layer ML defenses with complementary models that are designed to identify threats using different features. This way if one model is compromised hopefully another can still identify malicious code or activity. However, it may not be enough to simply buy multiple ML solutions. Problems of adaptation mean that the ML model training and testing environment may differ enough from where the model will be deployed that performance will be lackluster. Unfortunately, many organizations lack adequate internal training data and the expertise to customize solutions to their environment.
Another risk of utilizing supervised ML models in cybersecurity is that they can only reliably identify malicious activity for which there exists data. Therefore these solutions may provide little to no protection against emerging threats or unique attacks with one or a limited number of targets because of a lack of training data for these threats. In other ML applications anomaly detection using unsupervised learning methods would be a useful solution for this problem. However, benign anomalous activity is common within IT networks making filtering through false positives more difficult. Added to this, sophisticated attackers may be aware of the defenders anomaly detection and go to great lengths to ensure that the attack does not look anomalous.
Despite these challenges ML does have a place in cybersecurity programs. However, achieving meaningful results entails selecting the right set of models, being able to adequately tune and deploy them, and understanding their limitations and risks. Doing these things requires an experienced data science team that combines mathematical rigor, high level coding skills, and experience solving business problems with within a cybersecurity context. At infoedge, our data science team consists of PhDs in mathematics and physics, expert engineers with graduate degrees in ML, and analysts with decades of business facing experience. Together they can tackle the most difficult data science challenges.
CISOs and IT Data Leaders are beset by more IT risks than ever before, and they must adapt their security strategies to an environment that is becoming rapidly more complex. When properly deployed ML enables firms to optimize many security functions. However, most organizations struggle to turn these ideas into reality due to a lack of data science experience. Contact us to learn more about how our analytics and cybersecurity experts can help put machine learning in service of securing your data and reducing the risks to your company.