The Four Data Sets Essential for IT Operations Analytics (ITOA)

February 19, 2015

The Four Data Sets Essential for IT Operations Analytics (ITOA)

[This is the second post in a multi-part series, last updated on April 11, 2016. Read the first post: The Big Idea Behind IT Operations Analytics (ITOA): IT Big Data]

In my previous article, I explained that IT Operations Analytics (ITOA) borrows from Big Data principles and that, in order to enable effective insights and data-driven decisions, you must first design a data-driven architecture. This brings us to the question of which data sets to use for ITOA.

Big Data Is Only as Good as the Inputs

Consider a traditional business intelligence solution for sales and marketing analysis: You would not expect to derive a complete perspective on customer purchasing behavior by only analyzing a financial system's record of customer order activity. Rather, you would want to correlate that data with other data sets from your CRM system, support call system, Net Promoter surveys, and web activity if you want to derive much deeper insight into when, how, and why customers' do or don't make purchases. You would also evaluate these sources based on their trustworthiness, accuracy, and thoroughness as well as whether they provide value or create distraction.

The same holds true for IT Big Data—you would not evaluate end-user experience just by analyzing scripted transactions from a remote location. What about server processing and network transfer time? What about the behavior of the router, switch, firewall, authentication, application code, database, and storage system? These also contribute to the overall user experience. No user session or transaction is an island and neither are the elements of an application delivery chain. They are all interdependent and they all play a role in end-user experience.

To move away from the tool-centric approach I wrote about in last week's blog post, we believe an ITOA architecture and practice should be based on a data set taxonomy, which we refer to as the four sources of IT visibility and insight. The sources and their resulting data sets are wire, machine, agent, and synthetic, all of which I describe in detail below. I should mention that the inspiration for this taxonomy and the practice comes from simply observing what ExtraHop customers such as ShareBuilder, Concur, and T-Mobile have done organically in transforming their own operations to data-driven practices. Our customers are much smarter than me so I am indebted to their insight and guidance. It is also derived from some seminal research led by Will Cappelli, Gartner Research VP, and Colin Fletcher, Gartner Research Director in Gartner's Enterprise Management Practice.

Each ITOA source provides a unique and complementary perspective with some data sets being more significant than others depending upon the roles, questions, and requirements of an organization. It's extremely important to understand the differences between data sets when you assess your operational stance and engage with an IT Operations Management vendor. You will be able to better understand the type and degree of visibility a product can and cannot provide. Remember: the type of visibility and insight you can achieve will be dependent upon the source of data.

Wire Data – Observed and Unbiased Transaction Behavior

Wire Data Inputs

everythingand

Wire data's building block input is the network data packet. However, packets by themselves are not equivalent to wire data. Just as flour does not equal bread, a packet does not equal wire data. In order for packets to be transformed into wire data, they must be reassembled (whether out of order or fragmented) into a per-client session or full transaction stream. They must then be decoded with an understanding of each wire and application protocol's boundaries. This enables time-stamped measurement and data extraction, with the results indexed and stored in real time. The raw data packets are tossed and only the metadata or result data set is kept (unless you set a policy to record some of the packet stream). The resulting data set is not only well formatted, or "structured," but information rich. In terms of signal to noise ratio, once the raw packets have been transformed the wire data set is almost all signal.

Wire Data Characteristics

An outside-looking-in perspective for IT.
The unbiased observation of all activity and behavior across all tiers in an IT ecosystem.
Always on, observing and reporting all activity.
Spans on-premises and cloud-based workloads because all workloads will communicate on the wire at some point.
Largest source of visibility data (e.g. 100 Gbps of continuous analysis amounts to more than 1 PB of analyzed data per day).

How a Wire Data Platform Works

Performs real-time extraction and transformation of raw packets into structured wire data.
Architecture is based on a high-performing and scalable real-time stream processor.
Deploys passively, mitigating risk to or disturbance of an environment.
Auto-discovers, categorizes, and maps all relationships and application communication between applications, systems (machines) and clients.
Indexes and stores contextual analysis for real-time visualization, alerting, and trending.
Offers rapid programmability so users can customize, extract, measure, analyze, and visualize nearly any payload information transacting within a live stream.
Supports the simple construction of solution-oriented "apps" that each comprise a bundle of custom configuration objects such as data extraction scripts, session table storage, visualization definitions, alert thresholds, and trend math.
Ingests third-party data for real-time correlation and analysis via a high-performance session table.
Scales to speeds greater than 100 Gbps, the equivalent to millions of transactions per second, and decrypts SSL at line rate.
Enables the streaming of wire data in real time to proprietary and open-source external data stores, enabling cross-data set correlation.

Machine Data – System Self-Reported Information

Machine Data Inputs

Machine Data Characteristics

An inside-looking-out perspective.
A host-based perspective of self-reported events from machines across all tiers.
Contains pre-programmed event data from elements, systems, operating systems, applications, and other components in an application delivery chain.
Provides visibility for both on-premises and cloud-based components.
May add overhead to a system when logging is enabled.
Second-largest source of IT visibility (e.g. indexing and analyzing 1 TB a day is considered large).

How a Machine Data Platform Works

Extracts, forwards, stores, and then indexes machine data at run-time for powerful and flexible search.
Architecture is based on a distributed architecture of lightweight forwarders, indexers, search heads, and a scalable data store.
Native reporting and analysis, including search and visualization with the flexibility to customize this using a query language.
Programmability of the platform is focused on the building of applications that sit on top of collected machine data.
Ingests third-party data, such as wire and agent data, if the third-party platform supports their data format and/or common information model.
Supports the exportation of machine data to external data stores enabling correlation across data sets.

Agent Data – Host-Based Instrumented Behavior

Agent Data Inputs

Agent Data Characteristics

Host-based observed and instrumented behavior.
Measures, collects, and reports pre-determined and customized host-based metrics.
Often goes beyond machine data metrics in regard to system performance, hypervisor activity, operating systems, and application code-level analysis.
Offers the flexibility to extract additional data from transactions.
Spans both on-premises and cloud-based applications as long as the agent is compatible with the hypervisor, application, and O/S version.
Third-largest source of IT visibility. The amount depends on the number of hosts instrumented with agents.

How an Agent Data Platform Works

Architecture is based on agents (for application and database servers) that run analysis on the host and send collected data back to a central reporting server.
Lightweight deployment of agents that run analysis on the host and report results back to a central reporting server.
Some agent-based tools have a SaaS model where agents send data back to the vendor's cloud for scalable analysis of that data.
Programmability of the platform involves configuring the agents to collect specified data.
Ingests third-party data, like wire data and machine data, if the third-party platform supports their data format.
Exports agent data to external data stores enabling cross-data set correlation.

Synthetic Data – Scripted and Periodic Transaction Behavior

Synthetic Data Inputs

Synthetic Data Characteristics

Provides an outside-in, observational perspective similar to wire data.
Tests the availability and responsiveness of the application.
Able to replicate user experience from around the globe.
Excellent at identifying hard failures that are observable from outside the application environment.
Visibility is limited to user experience as defined by prebuilt tests and scripts and will only cover a portion of the application delivery chain.

How a Synthetic Data Platform Works

Service checks fire on a predetermined schedule and can be anything from simple ICMP pings to fully scripted transactions that run through the application flow.
To accurately mimic customer geolocation, probes can be fired from around the globe, representing many points of presence.
Deployment is easy, either as a hosted service or with lightweight on-premises options.

Stay Tuned

Want to learn how ITOA can help optimize your business? Read Designing & Building An Open ITOA Architecture

Discover more

Big DataCompanyTechAnalysisGood ReadsIT Operation AnalyticsWire Data Analytics

Erik Giesa

Senior Vice President of Marketing and Business Development

Erik Giesa is the Senior Vice President of Marketing and Business Development at ExtraHop Networks. Prior to joining ExtraHop, Erik was Senior Vice President of Product Management and Product Marketing at F5 Networks where he defined product, marketing, and solution strategy for all F5 products.

Explore related articles

The Rising Costs of a Data Breach

November 6, 2023

Large security breaches can easily cost hundreds of millions of dollars, and several new incidents could result in similar losses

C-LevelZero TrustSecurity Threats

Read article

Harnessing the Power of Network Data

February 28, 2023

Learn how NDR can help defend your organization by harnessing key network capabilities, including strategic decryption and investigative workflows.

Security ThreatsCybersecurityIndustry Trends

Read article

What Is Wire Data? (Video)

September 4, 2015

Take a minute and a half to learn what wire data is, and why it is the linchpin of successful IT operations analytics (ITOA).

TechWire Data Analytics

Read article

Experience RevealX NDR for Yourself

Schedule a demo

What is NDR

RevealX Platform

Integrations

NPM Resources

Capabilities

Ransomware Attacks

Advanced Threat Hunting

Threat Detection and Response

Network Forensics and Investigation

Security Hygiene

Cloud Workload Security

Operational Resilience

Troubleshooting and Resolution

Cloud Migration

Cloud Workload Monitoring

Network Forensics

Zero Trust

Multicloud & Hybrid Cloud Security

XDR Strategy

SOC Modernization

Digital Transformation

Financial Services

Education

Public Sector

Federal Civilian Agencies

Defense and Intelligence

State and Local Government

View Now

Careers

About

Press Releases

News

Leadership Team

Industry Recognition

Technology Partners

Channel Partners

Managed Service Providers

Apply Today

Sign In

Service Credits

Resident Experts

Implementation Services

Customer Community

Technical Support

Education Services

Virtual

In Person

View & Register

View & Register

Customer Stories

Reports

Demos & Videos

Webinars

Briefs

At-a-glance

Papers & E-books

Datasheets

Attack Types

Network Protocols

Blog

News & Articles

Overview

What is NDR

RevealX Platform

Integrations

Overview

NPM Resources

Capabilities

Overview

Overview

Overview

Ransomware Attacks

Advanced Threat Hunting

Threat Detection and Response

Network Forensics and Investigation

Security Hygiene

Cloud Workload Security

Overview

Operational Resilience

Troubleshooting and Resolution