Are you a systems admin? I am.
It's likely that you, like me, have a growing backlog and never-ending streams of email, alerts, and logs. It's something we all face as sysadmins, so I've got something helpful to share with you.
Alerts, logs, and streams of tickets can drown out things that are of importance. One of the things that can slip to the back of your mind is security. You say: "Security??!! I never forget about security! Are you crazy?" But hear me out: When was the last time you reviewed the logins for your servers? Not just your LDAP or AD servers, but all of them. How about looking through your Apache logs for funky user-agent strings?
There are many tools out there that aim to help with these tasks: log aggregators, alert systems, dashboards, and common tools. The security space is full of specialty tools for intrusion detection, malware and ransomware discovery, and auditing. These can sure help, but only when you have time to peruse them or when they are tuned to not inundate you with emails. I still can't get all of the syntax down for Kibana.
What's the answer?
The answers can be surprisingly simple, but effective. I've come up with a few tips and a story to share with you.
Know Your Network
This doesn't have to be time-consuming or hard. All you really need to do is think about where things belong. Once you can view what's actually communicating on your network, it will pop out to you when something seems out of place. Your knowledge about your network is your strongest security asset. Rob Joyce, a 25-year NSA veteran, said, "If you really want to protect your network, you really have to know your network. You have to know the devices, the security technologies, and the things inside it."
The best way I've found to really know your network is with wire data. ExtraHop finds the truth for you without all of the tuning and manual updates to your alerting systems and logging tools. Logs and alerts can lie or be noisy. Wire data is pure signal.
Make Your Tools Work for You
Tuning your email filters to cull certain Nagios emails isn't the answer. Digging out actionable information from your tools shouldn't be your job. How do you get around this? Machine learning! It's the new buzzword everyone is using these days, but what does it really mean? Machine learning is just a fancy way of saying, "Let the computer do the work for you." It hunts through all of your data and presents you with something you can actually deal with. Addy—ExtraHop's award-winning machine learning technology—has helped bring to light some things in our own environment that we wouldn't have noticed otherwise. It watches all of the fun little electrons buzzing around your network and gives you a little nudge when something isn't quite right.
Get Documentation Before It Hits the Fan
Keep your notes in documentation. Whether it's a major outage or a hacker snooping in your database, you and your team really need to know what to do next. Even the most level-headed engineer can miss a step or two when it all goes sideways. It doesn't take but a moment to write down a gotcha, a command you ran, or where this app lives. It doesn't have to be perfect, but you need something. One sentence is better than nothing. Make sure to update your docs when things do go sideways. You likely learned a command or gotcha that would help for next time.
So How Do These Things Really Help?
ExtraHop ran an internal security test with a rogue machine a few months back. Only a few people knew about it, so it was up to us and our tools to find it and what it did while it was running. Our first clue came from our developer working on the early versions of Addy. Yay, machine learning! He noticed that Addy alerted to a large uptick in DNS errors. See that? The tool was doing the work. My developer and I didn't spend time pouring over logs or looking at SNMP charts to determine if something was "normal", or if it was broken. It showed it to us in a very simple and actionable way.
We popped into our trusty Extrahop Discover appliance to see what this was all about. Addy pointed back to our PXE server, which was running a lot of reverse DNS lookups that were failing. What was it doing? It was looking up an IP in our user-side VLAN. Turns out, it was a device that said was a Dell iDRAC.
Here is where "knowing your network" becomes critical. I know my VLANs. I know that we don't put servers on our user-side VLANs. This got me curious and led me down a rabbit hole. It was late on a Friday. I could have brushed it off and looked at it Monday, but I dug in as I knew something was out of place.
So, to recap: We figured out that we had a good server doing reverse lookups of an odd machine. Here comes wire data to the rescue! Looking at the ExtraHop monitoring our network, we saw that the odd machine was SSH'ing to the server many times a minute. I popped open the logs of that server to see what was going on. We noticed that the weird little machine was trying to SSH in as user "accounting."
This was really not good. I sounded the alarm to the team, gathered the troops, and informed them that we had a real security problem on our hands. We needed to figure out what this was doing and how to shut it off. We used our Extrahop to pin down the servers and files it had accessed (or tried to access) and the AWS machine controlling it. We searched through our switches and traced some wires to locate a little Raspberry Pi connected in a conference room, our rogue machine. This is a prime example of the power of wire data and getting your tools to work for you; we found the truth of what a rogue server did in a sea of data. Thankfully, this was only a test.
How long would it have taken us to actually look at the auth logs or spot this thing in the corner of the room? How would we have ever figured out all of the things it touched? We will never know for sure, but it would have taken a lot longer than it did. I know my team works hard, so spotting something a little off in logs or DNS counts by ourselves may have taken some time.
What Did We Learn Here?
Addy notified us of a really odd behavior that we needed to know about. We didn't have to stop our work on tickets or server maintenance to find it. Knowing our network paid off by something standing out in the wrong place, in this case a server on the user-side VLAN. Our conference room ports were a weak spot. We've fixed this by moving the ports to our isolated guest network.
But What About Documentation?
We've updated our switches with the mapped ports in the conference rooms. It's not fancy, but it will keep us from tracing cables and help us get to the physical locations faster. We've also updated our docs for a Cisco gotcha that we encountered: MAC addresses are usually written as six pairs of hex separated by colons. Cisco stores them as three sets of hex separated by dots. This made our switch searches fail by not matching the search strings. We now have that simple fact written down and it will save us a lot of time if we ever have a real malicious machine on our network.