The Guardian reports that Research in Motion's (RIM) three-day outage for its BlackBerry service was caused by faulty networking equipment in its U.K. network operations center, located in Slough. RIM operates just four network operations centers around the world.
A router failure is to be expected. That's what backup sites are for. Unfortunately for millions of RIM BlackBerry customers, the company didn't have the failover to an Egham, Surrey location fully baked, and as a result the switchover began corrupting the massive database that undergirds BlackBerry services.
According to the Guardian, RIM staff didn't realize exactly what was happening until several hours later. By that time, the corruptions had reached a tremendous scale as the network operations center normally processes 8 gigabytes of data every second. To make things short, the RIM IT operations folks spent several days sorting things out, trying to restore data and then catching up with the backlog of messages, emails, and web data.
One of the advantages of the ExtraHop system is that it can be deployed on production databases continuously without impacting performance. That means IT teams get real-time alerts on performance issues like database corruptions. Other monitoring tools are available for this, but they often use resource-intensive agents or profilers. As a result, IT staff turn them off, leaving themselves blind to problems like RIM's outage until it is too late.
Don't let this happen to you! With passive, continuous monitoring of database performance, you can avoid embarrassing service outages.