Learning a New Codebase? Static Code Analysis Is Your Friend

One of the most disorienting experiences I've encountered as a software tester is a complete change of codebase. For the first eight years of my career, I was neck deep in the Windows UI codebase, and when I moved to Windows Phone in 2009, suddenly day was night, up was down, everything was different. The same thing happened this year when I joined the ExtraHop team.

How I Get Oriented in a New Codebase

MonkDave Monk is a Software Test Engineer at ExtraHop. He uses SCA to get himself oriented in an unfamiliar codebase.

In order to get my bearings, one of the first things I do in these situations is start to perform static code analysis on the codebase in question.

Static Code Analysis (SCA) is a class of testing where you don't have to execute the source code you're analyzing. Heck, you don't always even need to compile the code. The goal is to find bugs such as memory leaks, using memory after freeing it, null dereferences, and known security exploits. The exploration of the code using SCA can provide a quick check for the relative health of the code. At its most rudimentary level, the tester is running regular expression queries against the code in order to find commonly encountered problems. This is actually a good starting point.

The Best Static Code Analysis Tools

A good example of this is a commonly encountered bug in realloc. If a realloc call fails, and the exception is not handled, then memory can be leaked (mainly due to realloc returning NULL on failure, so pointers to the memory can be lost). Solution? Do a quick search through the code to make sure the realloc calls are handled correctly.

To get even more out of SCA, the next step is to couple the static analysis with the compiler. Microsoft did a great job of this in the Windows ecosystem with the PreFast static analyzer. However, if you are working in a UNIX-based environment, PreFast won't really help you. Relief comes in the form of two powerful tools: CppCheck and the Clang compiler. How these tools are used against your code base will vary depending on how your code base is laid out, and will require some investigation. These tools also focus on C/C++ codebases. Languages such as Python and Go have their own powerful SCA tools available.

CppCheck performs static analysis on both C and C++ source code, and can walk your entire build tree. One of the best things about it (besides being free), is that one of its core design principles is the reduction of false positives. This is very important, as there is nothing that reduces the faith developers may have in static code analysis bugs faster than a barrage of false positives. False positives can come from many causes, including complex macros, intentional design decisions, and obfuscated syntax.

Clang takes this to a new level of awesomeness. For every defect it finds, it provides an annotated HTML file with the path taken through the code and what conditions it is using for the traversal. However, this complexity is a double-edged sword. Clang has found for me bugs with 100 step repros (that's good), but trying to argue that said bug is a higher priority bug than existing issues can be an uphill battle (that's bad). Clang seems to be moving away from the leak detection aspect of SCA, as versions of clang higher than 3.6 leave out leak detection modules. Clang release notes mention that tools such as Valgrind and ASAN (Address Sanitizer) do a better job. I'll go into more detail on those tools another time.

Running SCA tools are only a first step. The next step is to KEEP doing so, and watch for changes and regressions. Even if the SCA issue turns out to be a false positive, there should be a tracking bug for each issue.

Subscribe to our Newsletter

Get the latest from ExtraHop delivered straight to your inbox.