NEW

The True Cost of a Security Breach

Arrow pointing right
ExtraHop Logo
  • Productschevron right
  • Solutionschevron right
  • Why ExtraHopchevron right
  • Blogchevron right
  • Resourceschevron right

Arrow pointing leftBlog

Paying Down Technical Debt in Your IT Infrastructure

Tyson Supasatit

February 15, 2014

The Phoenix Project co-author Gene Kim spoke at the ExtraHop sales conference in January, explaining how technical debt led to the "IT death spiral."

The Phoenix Project co-author Gene Kim spoke at the ExtraHop sales conference in January, explaining technical debt and the 'IT death spiral.'

The idea of "technical debt" is one of the most important things I learned from The Phoenix Project. Described as "a novel about IT, devops, and helping your business win," the book has deeply influenced the IT operations community. In the simplest terms, technical debt is the result of not doing things right in the first place. Here's Erik, a lean-methodology guru in the book, describing technical debt:

"… like financial debt, the compounding interest costs grow over time. If an organization doesn't pay down its technical debt, every calorie in the organization can be spent just paying interest, in the form of unplanned work."

As illustrated in The Phoenix Project, the accumulation of technical debt results in constant firefighting and an inability to implement new projects quickly. A less recognized yet equally damaging result is the increased waste and noise in the IT infrastructure, causing:

  • Unnecessary infrastructure purchases
  • Greater load on critical resources
  • Low signal-to-noise ratio
  • Security vulnerabilities
  • More places for malware to hide

Supporting Continuous Improvement with ExtraHop

analyzing their wire data

Many organizations use ExtraHop to support continuous improvement environment, applying methodologies adapted from lean manufacturing. ExtraHop's Atlas Services remote analysis reports are a perfect fit for these "lean IT" efforts. IT organizations receive regular analysis across all tiers of their environment, identifying both acute and chronic issues, and then use these reports to create work items for their kanban-type scheduling systems.

By dedicating resources to paying down their technical debt—fixing misconfigurations, adjusting settings, optimizing scripts, decommissioning legacy systems, etc.—these IT organizations are freeing up capacity, increasing goodput, addressing issues proactively, and improving signal-to-noise ratios so that it is easier to spot anomalous behavior.

Real-World Examples of Paying Down Technical Debt

DNS

less than 1 percent across their entire environment!

The red bars at the bottom show DNS errors. After problems are fixed in the middle of October, the errors drop significantly.

The red bars at the bottom show DNS errors. After problems are fixed in the middle of October, the errors drop significantly

In August, DNS servers responded with 409,404 errors for 4.1 million DNS requests—an 11.6 percent error rate.

In August, DNS servers responded with 409,404 errors for 4.1 million DNS requests—an 11.6 percent error rate.

After the problems are fixed in October, the DNS servers responded with 15,987 errors for 3.09 million DNS requests—an error rate of less than 1 percent.

After the problems are fixed in October, the DNS servers responded with 15,987 errors for 3.09 million DNS requests—an error rate of less than 1 percent.

TCP

recreates the TCP state machines

In August, out-of-order segments and tinygrams were contributing to network congestion.

In August, out-of-order segments and tinygrams were contributing to network congestion.

After the problems were fixed in October, out-of-order segments and tinygrams were reduced by 90 percent.

After the problems were fixed in October, out-of-order segments and tinygrams were reduced by 90 percent.

HTTP

This is a large environment with upwards of 3,000 web transactions per second at peak periods, and analyzing large amounts of data at the level of detail that ExtraHop does is no trivial task.

HTTP errors are reduced by 9.5 times after the problem is identified and fixed. In large environments, it can be difficult to analyze all transactions with sufficient detail to pinpoint problems.

HTTP errors are reduced by 9.5 times after the problem is identified and fixed. In large environments, it can be difficult to analyze all transactions with sufficient detail to pinpoint problems.

Database

Oracle database monitoring

The problem causing the "(ORA-28000) the account is locked" errors is fixed on March 12, resulting in an almost complete elimination of database errors.

The problem causing the '(ORA-28000) the account is locked' errors is fixed on March 12, resulting in an almost complete elimination of database errors.

After the fix is implemented on March 12, database server response time is much more predictable (and fast, with responses in less than a millisecond).

After the fix is implemented on March 12, database server response time is much more predictable (and fast, with responses in less than a millisecond).

LDAP

LDAP monitoring

A general configuration change results in 5 times less load on the LDAP server and dramatic reduction in LDAP errors.

A general configuration change results in 5 times less load on the LDAP server and dramatic reduction in LDAP errors.

Make It Easier to Take the Doctor's Orders

Check out the sample Atlas remote analysis report below and then visit the web page to learn more.

Experience RevealX NDR for Yourself

Schedule a demo