This blog was authored by ExtraHop Reveal(x) customer Mitch Roberson. Visit his blog for more great tech content here.
As other posts on my blog make clear for regular readers, I have a pet peeve about DNS. It seems like it is the bane of my existence some days. The reason I say this is that oftentimes small problems that are intermittent end up being caused by DNS issues.
Recently I was privy to an interesting event that got me thinking about all the companies that do not monitor DNS very closely. I have seen this type of event in multiple environments and it happens more often than most people think, but it's surprisingly easy to miss.
Many places I have seen or talked to have told me they gather DNS logs. But how many of those places can quickly analyze them and or graph them out to see different problems? How many application teams actually pay attention to DNS? The problem with logs is, when you have an issue like the one I'm about to show you, the logs roll over very quickly (in some cases, in seconds). If your logs roll over that quickly, you might not be able to pick up on hits like this. That's one reason I'm glad we have a monitoring solution that can keep up.
Thanks to Reveal(x), our network detection & response solution from ExtraHop, we notice a big uptick with internal DNS requests (see fig. 1). Larger than anything we have seen in a while. This obviously concerned us. As you can see, our responses were keeping up with the requests so we felt like our infrastructure was handling the load well. But we still wanted to investigate further.
Looking at our range before the spike, you can see we normally run in the 10-15k or less range over a 1-hour period. And we had jumped to around 30k with a big spike to 60k. Not only is that a massive spike, it was also during peak hours so we definitely wanted to follow up.
Further digging with Reveal(x) showed us that we had a single client that made more than 4 million queries over an hour-long period (Fig. 3). This was kind of cool. It took us a bit to track down the client. We found it was an old machine that used an application that talked to a server that had been taken off the network years ago. For some reason this 1 machine had been turned back on probably due to a power outage we recently had. Once we did find the machine and took it off the network, everything was happy.
Events like this are one reason it's valuable to monitor your DNS, but there is so much more information you can gain. For example, oftentimes people do not realize if they have something misconfigured.
Let's say you have a phone system set up to query phone.domain.com, but there is no DNS record for it. What happens in many cases is that the application will constantly query for it. I have seen cases where applications perform 10k queries in a minute simply because the name does not exist. Think about the load that puts on the server.
And what if these are also your domain controllers? Have you ever had weird auth issues where it seems like a DC is not functioning correctly only for a couple of seconds? Do you know how often your secondary DNS server is used? Occasional application timeout can be caused by DNS issues, and so can many other problems.
By keeping your DNS infrastructure clean, you can identify problems much faster, as well as potential security risks.