Storage Performance Monitoring: 3 Metrics Needed for Holistic Visibility

January 23, 2013

Storage Performance Monitoring: 3 Metrics Needed for Holistic Visibility

Service level agreement (SLA) cartoon for storage performance monitoring

The Service Level Agreement is always the first casualty in the war to assign blame.

Most IT organizations monitor their applications and infrastructure in an extremely disjointed manner, with each specialist team relying on tools that provide visibility into a specific technology silo: network tools for the network engineers, database profilers for the DBAs, agent-based APM tools for developers, and so forth. This fractured approach to monitoring contributes to high IT costs, poor user experience, wasted capacity, and an IT organization that is responding to issues reactively instead of proactively.

SAN and NAS Performance Visibility

ExtraHop offers a much-needed new approach that provides holistic visibility across the entire application delivery chain. This cross-tier view enables IT teams to easily understand how applications are impacting the database, network, and storage tiers. With shared operational intelligence, IT teams can collaborate to solve problems faster and identify interrelated issues that would otherwise go undetected.

This month's Performance Metric of the Month highlights the importance of CIFS, NFS, and iSCSI transaction metrics in the context of other application and infrastructure performance. The three real-world examples below demonstrate the value of this correlated visibility.

Case #1 – Tiered Storage vs. the Rogue Application

This customer used the ExtraHop system to inspect a list of all transactions hitting the DataDomain system during the periods of slow performance and identified a single system that was aggressively reading from the storage system. As the back-up storage system was optimized for writes and not reads, this activity had a serious impact on overall performance. The ExtraHop system made this diagnosis easy by showing all the read and write transactions on a per-client basis. This capability can also be applied to monitoring OLAP database applications, or data warehouses, which are optimized for reads.

Case #2 – iSCSI Connectivity Issues and the Confused SAN

Figure 1. Mapping iSCSI connections helped identify misconfigured servers.

During a proof-of-concept demonstration, the IT manager at the company and an ExtraHop systems engineer confirmed the iSCSI connectivity issues and then pinpointed the specific servers experiencing these problems out of the entire pool of Xen and VMware servers. By generating an application activity map that visually mapped all devices using the iSCSI protocol (see Figure 1), the IT manager confirmed that the two suspect servers were connecting to the SAN in different ways. These servers were using the Microsoft iSCSI Software Initiator in Windows in addition to host-bus adapters (HBAs). As the SAN tried to load-balance requests across all available interfaces and controllers, it would sometimes send a response from the HBA back to the Microsoft iSCSI Software Initiator on that same server, which would then drop the response.

The ExtraHop system helped to solve this obscure issue by providing the necessary context. With the problem identified, the IT manager turned off the Microsoft iSCSI Software Initiator on those servers, and the iSCSI connectivity issues disappeared.

Case #3 – The Bandwidth-Hog Logging System

Figure 2. The ExtraHop system analyzes L7 application protocols.

A bug in the log archive script caused large files to be copied across the network repeatedly. Five million files were unnecessarily rewritten. The network team was unfamiliar with the logging system and had assumed that this growth was organic. In fact, they were preparing a forklift upgrade of the network infrastructure to handle this increased traffic—a cost of hundreds of thousands of dollars. However, with the archive script fixed, network utilization dropped by an astounding 70 percent, which helped the company defer a significant unnecessary capital expense.

Legacy network-monitoring tools would not have helped in this case. Only the ExtraHop system, with its ability to analyze L7 application-level details, is able to distinguish CIFS traffic (see Figure 2) and list the filenames for each transaction.

What's Needed: An Operational Intelligence Solution

If you have your own networked-storage tales to tell, please leave a comment below. Or, if you're interested in finding out how the capabilities of the ExtraHop system can help you, try the free, interactive ExtraHop demo.

Discover more

AnalysisGood Reads

Justin Baker

Director of Marketing

Justin is Director of Marketing at ExtraHop.

Experience RevealX NDR for Yourself

Schedule a demo

What is NDR

RevealX Platform

Integrations

NPM Resources

Capabilities

Ransomware Attacks

Advanced Threat Hunting

Threat Detection and Response

Network Forensics and Investigation

Security Hygiene

Cloud Workload Security

Operational Resilience

Troubleshooting and Resolution

Cloud Migration

Cloud Workload Monitoring

Network Forensics

Zero Trust

Multicloud & Hybrid Cloud Security

XDR Strategy

SOC Modernization

Digital Transformation

Financial Services

Education

Public Sector

Federal Civilian Agencies

Defense and Intelligence

State and Local Government

View Now

Careers

About

Press Releases

News

Leadership Team

Industry Recognition

Technology Partners

Channel Partners

Managed Service Providers

Apply Today

Sign In

Service Credits

Resident Experts

Implementation Services

Customer Community

Technical Support

Education Services

Virtual

In Person

View & Register

View & Register

Customer Stories

Reports

Demos & Videos

Webinars

Briefs

At-a-glance

Papers & E-books

Datasheets

Attack Types

Network Protocols

Blog

News & Articles

Overview

What is NDR

RevealX Platform

Integrations

Overview

NPM Resources

Capabilities

Overview

Overview

Overview

Ransomware Attacks

Advanced Threat Hunting

Threat Detection and Response

Network Forensics and Investigation

Security Hygiene

Cloud Workload Security

Overview

Operational Resilience

Troubleshooting and Resolution