The last week of January 2021 was an eventful week for the stock market, to say the least. A fast-growing cohort of hobbyist traders and stock market novices, spurred on by an influential subreddit, drove the price of GameStop stock to astronomical highs, whipping Wall Street, Capitol Hill, and the internet, into a frenzy.
In the midst of the mayhem, one of our expert trainers did a virtual training with the retail arm of a large bank. Stock trading is included in this bank's line of business. In the midst of a very busy week the bank was struggling with a slow-down but couldn't pin down the cause.
Q: So, what happened when you got on the call?
A: I dialed in and before I could get a word in edgewise, one of the trainees asks, "can we, uh, we had a, um, could we look at database transactions? We've, uh, seen some things."* So we fired up their Reveal(x) instance and took a look.
Q: Is that standard training operating procedure?
A: Every training is different, but we always do our training on live data so going in to look at actual issues isn't a problem. In fact, we often uncover things in training because we're looking at their actual environment.
Q: Ok, so what did you see?
A: So this Thursday, January 28th, just a little after 9:30 AM Eastern—when the stock market opened. Sure enough, we could see that right at 9:30 AM transactions in their database environment almost instantly tripled. This makes sense—the markets open, trades start flying.
Q: So what were they trying to figure out?
A: With everything happening on Wall Street that week, the network and database teams were under the gun. The last thing they needed was to have to explain a slow-down to the higher-ups. Remember, ExtraHop measures network round trip time by looking at actual data in flight—we don't use weak sauce like ping or sampling—so we could tell definitively whether this volume of database transactions was having an impact. The great news was that we could clearly see in Reveal(x) that the network was behaving at a consistent pattern at the time the database transaction volume tripled. Put another way, there was no correlation between transaction volume and network latency. So far, so good.
Q: Great, so everything looks good. Was there anything interesting?
A: Yes! Far more interesting than the real-time view was the historical view. We pivoted to look back over the past couple of months and trend database transactions. For example in a 17-day period we saw upwards of 2.8 billion database transactions. You could see a gradual increase. And then last week a noticeable increase—almost like the word was getting out about a clever plan.
We started with a week-over-week comparison using Reveal(x)'s ability to compare distinct time intervals. Comparing Friday, January 22, against Friday, January 15, we could see that traffic on the 22nd was a bit busier than the Friday prior.
Then we compared those days to the traffic on Monday, January 25th. We saw another uptick in transactions while the network remained constant. The network team was starting to breathe easier.
Q: But we all know this probably isn't where it ended.
A: Nope! Tuesday, January 26th saw more of an uptick in the afternoon... almost like the word is spreading among the cool kids.
Wednesday morning, January 27th, the market went crazy at market open. Thursday morning also went crazy . . . until something happened and suddenly transaction volume fell. It just so happens to coincide with the time that several organizations halted trading on certain stocks.
Then Friday morning, January 29th, traffic again went crazy at 9:30 AM ET.
Even through all of that up-and-down with transaction volume skyrocketing, we could show that network performance was holding steady.
Q: Okay, so the network team now has pretty solid proof that they're not the problem. So why is it slow?
A: Good question! We dove into database transaction records to find out. We were able to identify database transactions taking more than five seconds to execute. We prepared a list of transactions that were "slow" and were able to send them to the database team.
We also identified a number of different monitoring efforts failing due to login and syntax issues. This was a self-licking lollipop if a database monitoring tool can't login it can't monitor your database.
Using Reveal(x), we were able to sift through hundreds of millions of transactions and pluck out the slow transactions. We packaged the relevant bits (table, user, statement, processing time) and sent them to the database team for further analysis.
Q: So any words of wisdom or lessons learned?
A: Well, I'm reminded of a song from the movie, Robin Hood—the awesome animated Disney one, not the other crappy ones. The line goes: "Every town has its ups and downs. Sometimes ups outnumber the downs."
Every business has its ups and downs when it comes to transaction volumes. It's important to understand that you can never fully predict when you're going to see a spike. What you can do is make sure that you can quickly understand the effect that spike is having across your IT environment, and take action to ensure it doesn't result in business or customer pain.