In my last post, I explained the basic architecture of our cloud-scale machine learning (ML) as well as the key benefits this type of architecture provides when compared to locally hosted ML. This blog post will focus on ExtraHop Reveal(x) specifically, so you can see how our industry-leading network detection and response (NDR) product empowers security teams through advanced ML technologies.
Modern enterprises are moving at a breakneck pace, and security teams struggle to keep up with all the new vulnerabilities, TTPs, security logs, and event data being generated every day. While Reveal(x) provides industry-leading network visibility (north-south and east-west) into customers' hybrid enterprise environments, we at ExtraHop believe it is also paramount to perform automated, high-quality detections as part of a complete NDR solution. At the end of the day, security data that are automatically analyzed and used to generate actionable insights create vastly more value than data that are unanalyzed and simply archived.
Actionable Insights With ML
Reveal(x) leverages cloud-scale ML architecture to deliver best-in-class scalability and threat detection. ExtraHop has invested heavily in machine learning and combined our years of experience in threat research, network analytics and data science to build best-in-class ML technology into the core of Reveal(x).
More specifically, Reveal(x) uses ML to not only detect attacks, but also help customers investigate and respond to attacks faster by automating information gathering and putting that information into context.
In order to achieve this, Reveal(x)'s ML is built using a multi-subsystem design, very similar to modern autonomous driving solutions, where a collection of sophisticated and patented ML subsystems—designed to extract insights, detect threats, and gather context—work in unison. Here is an example of Udacity's autonomous driving system:
In Reveal(x), the ML-powered analysis and detection are delivered by three subsystems: Perception, Detection, and Investigation. Each subsystem contains multiple components responsible for different distinct functions. ML components in the same subsystem and across different subsystems collaborate and exchange data and findings:
Perception: Observing and Inferring Context
The first subsystem is Perception, which contains a suite of ML components focused on understanding each customer's unique environment. With over a decade of experience in analyzing network data, we at ExtraHop know that each customer's environment is different and each customer has its own unique IT and security policies and practises. In order to produce the best analytical results specific to each customer, we built multiple components to automatically infer customer-specific contextual information, such as peer groups, security policies and device and user roles, based on observed unique behaviors of every entity on the network.
These contextual pieces of information are later ingested by other components to improve accuracy and reduce noise. Some of the components in the Perception subsystem are:
- Device Clustering: Identify groups of devices that exhibit similar behaviors on the network–a small cluster of databases, a large group of VoIP phones or a set of developer workstations.
- Network Security Policy Inference: Infer network segments that are expected to be exposed to public Internet and external services that are approved to be accessed.
- Asset/Device Criticality Ranking: Analyze how different devices interact with each other on the network and identify devices that are more important to the business, such as file shares containing critical financial data, bastion hosts for administrators, and databases that back customer-facing web apps.
- Behavioral Profile Inference: Infer behavioral profiles of different devices (such as domain controllers, mobile phones and file servers) based on observed communication patterns.
- Privileged Entity Identification: Reveal(x) utilizes patented analytics on a variety of network protocols (such as LDAP, Kerberos, CIFS, etc.) to track user behaviors across the network. This ML component continuously analyzes authentication and access patterns of different users, and identifies privileged entities on the network, such as IT admins, Domain admins, and DB admins.
- Traffic Pattern Identification: Reveal(x) leverages its full-stream reassembly and sophisticated analysis to intelligently identify high-interest security-related traffic patterns in the network. For example, Reveal(x) passively correlates external IPs with VPN client IPs on the network, quickly identifying the specific user and entity behind individual VPN client IPs. It is also able to robustly identify different forms of interactive traffic sessions regardless of encapsulation, obfuscation, and tunneling.
Detection: Observing and Predicting Behavior in Context
The second and largest subsystem is Detection and it consists of a collection of components that build self-adapting predictive models for every single entity (device and user) on the network. We then feed the modeling and other metadata (such as knowledge inferred from the Perception ML components) into a large set of specialized detectors. The models are continuously updated, making sure they reflect the up-to-date behavior patterns.
Due to the diversity of the forms and techniques used by modern cyber threats, no one model is general enough to identify all attacks. Instead, Reveal(x) relies on an ensemble of predictive model classes, each covering a specific aspect of an entity's behavior. In some cases, Reveal(x) can create over a hundred models for a single device or user, depending on its activity, criticality and attributes. To detect potential threats, we also developed hundreds of purpose-built detectors (see more in our whitepapers around detection) that analyze observed behavior in real time.
Every ML-based detector is custom-built for detecting a specific suspicious behavior or attack technique, identified by our threat research team, and uniquely leverages their security expertise and sophisticated predictive models. In addition to that, the detectors are constantly being refined based on learning from the field team and customers. Here are a few ML components in this subsystem:
- Time-Series Analysis: Predict the expected behavior and volume of behavior based on historical observed behaviors. Due to the varied nature of different attacks, we have developed a suite of time-series analysis algorithms each optimized for a specific scenario and temporal granularity.
- Behavior Graph Analytics: Network behaviors can be represented as layers of very large graphs where the nodes are the entities on the network, and edges contain relational information or behaviors. By analyzing these large graphs via various proprietary ML algorithms, this component can accurately model behavioral patterns and detect subtle but suspicious changes, such as when an attacker is already inside the network but is interacting with high importance data servers from a low privileged account.
- Peer Group Behavior Modeling: Model how devices that belong to the same special-purpose peer group behave from different perspectives and identify when an attacker has compromised a device and causes it to start acting differently than its peers.
- User Behavior Analytics: Model the login and access patterns of users in Active Directory, which can enable detection of insider attacks and compromised credential misuses.
Investigation: Automating Analysis for Accelerated Triage & Response
The third subsystem is Investigation, which is in charge of providing automated analysis for potential threats detected and assisting in the triage, investigation and remediation processes. We spent significant effort building and refining components in this subsystem because detecting potential threats is only half of the battle. Being able to easily triage, investigate, and remediate threats is equally important to our customers.
Along with optimized investigation workflows, components in this subsystem are a key part of ExtraHop's differentiation. Three examples of ML components we have in this subsystem are:
- Autonomous Root Cause Analysis: Reveal(x) has an industry-leading capability to record every single activity and behavior on the network. However, similar to solving a crime in meatspace, going through the evidence and gathering relevant context can be very labor-intensive. This ML component performs autonomous context gathering for every detection by simulating how a human analyst would go through different related pieces of information of a detection. More specifically, by leveraging the predictive models that are built by the Detection ML components, the component is able to collect most of the relevant information without any manual guidance. For example, when a detection is triggered on abnormally large file accesses on a highly sensitive database, this component can automatically identify the corresponding suspicious users, clients, database tables and SQL queries that fall outside of the normal range of behavior for those entities.
- Detector-Specific Root Cause Analysis: For certain detectors where human intervention is the best first response, we built templatized analytical playbooks for each of them using our expertise in threat research to provide all the attack specific information and concrete, practical next steps at a glance for our customers.
- Intelligent Prioritization: By leveraging Reveal(x)'s 360 degree visibility into customers' environments, this ML component is able to accurately prioritize detections. It takes multiple factors into consideration, including attributes of the individual detection and customer/business specific context that the Perception subsystem identified.
Machine learning is not magic, and not all machine learning is created equal. Understanding the inner workings of machine learning systems that you rely upon is important, especially for mission critical tasks like cybersecurity detection and response. When you are considering buying a product that touts machine learning as a central mechanism for its functionality, it is worth doing some critical thinking and digging into the important details, including:
- What data sources do these machine learning systems leverage?
- How are these ML models built, managed, and updated to assure they keep up with the rapid pace of change in your dynamic environment?
- Does the vendor tout a "magic algorithm" that solves all problems? Will ML just create more friction for my business and people, or will it actually augment my staff's ability to succeed in their work?
If you're curious about how these subsystems work in practice, give our interactive demo a try or reach out for a conversation with an engineer—we're happy to answer questions or walk through specific use cases.