Even Google Makes Mistakes
23 Apr 2017Google is, I believe, one of the finest software engineering organizations on the planet. They invest heavily in code review and automated testing. Their infrastructure is top-notch, allowing developers to trivially build, refactor, and test code across their entire monorepo. They regularly release new tooling, libraries, and languages that to advance the state of software development inside and outside of Google. They have a massive internal library of easily searchable code. They’ve pioneered the SRE approach to DevOps. Their hiring practices (contentious though they may be) at least ensure that the bar for employment is kept high.
I was surprised, therefore, to hit several bugs and problems in their open source code in the space of a couple of days:
-
Google Mock, Google’s C++ mocking framework, may follow a null pointer under some circumstances. This is an example of how undefined behavior can come back to bite you in C and C++. Consider, for example, the following code:
class Test { public: void SayHello() { printf("Hello\n"); } }; Test *test = NULL; test->SayHello();
Even though
this
is null withinSayHello
, it’s never actually used, so it theoretically could be harmless. However, the C++ standard states that following a null pointer is undefined behavior, and so once compilers (GCC 6 in particular) got smart enough to recognize this potentially undefined behavior and exploit it for optimization purposes, the code stopped working. (John Regehr’s “A Guide to Undefined Behavior in C and C++” explains this sort of issue in much more detail.) -
At the time of this writing, Google Test and Google Mock are failing their AppVeyor CI builds.
-
Protocol Buffers (protobuf), Google’s primary approach for data serialization, used problematic techniques to determine the offset with a class of each of a protobuf message’s fields’ member variables. C offers an
offsetof
macro that works for this, but unfortunately, it’s considered undefined behavior for many C++ classes, because of reasons. As a workaround for this missing standard functionality, the protobuf implementation resorted to the following hack:#define GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(TYPE, FIELD) \ static_cast<int>( \ reinterpret_cast<const char*>( \ &reinterpret_cast<const TYPE*>(16)->FIELD) - \ reinterpret_cast<const char*>(16))
This takes an arbitrary integer (
16
—NULL
would be the obvious choice for a C or C++ programmer, but using that triggered compiler warnings), pretends it’s a pointer to an actual class instance, then takes the difference between that and a pointer to an instance variable.Accessing an invalid pointer like that is undefined behavior, but since it’s only used for pointer arithmetic, it typically works. However, when compiled under UBSan, which aggressively checks pointers and object references as they’re used, it crashes. (UBSan didn’t even exist at the time these protobuf macros were written, so even though the protobuf devs knowingly used undefined behavior, it wasn’t a problem with the tools that existed at the time.)
-
The download page for Protocol Buffers, Google’s primary approach for data serialization, still references version 3.0.0, making it is three releases and six months behind the actual current release. If Google, the biggest web company on the planet, can’t keep their own web sites up to date, then what hope do us mere mortals have?
On the one hand, all of this can act as a reality check. I certainly don’t bring all of this up to criticize or fault Google. They have an excellent engineering culture and talented and motivated developers, but even the best organizations can make mistakes, and even companies with massive resources may end up leaving worthwhile tasks undone as part of focusing on higher priorities. Realizing this can help me calibrate my own expectations for my considerably less massive software development efforts. Perfection is a worthwhile goal, but that doesn’t mean it’s attainable.
On the other hand, this acts as a reminder to test my assumptions. When I first encountered these bugs, I assumed that the problem must lie elsewhere, because protobuf and Google Test are mature and well-supported projects. Five minutes with a debugger would have shown be otherwise. Although it’s true that “select isn’t broken,”, projects from multi-billion-companies might be. The size and complexity of modern software systems mean that bugs are virtually guaranteed. Assuming that the bug is your own fault is a good place to start, but test those assumptions.
Finally, this is a reminder to document your work. One of the most useful results of the popularity of Stack Overflow and GitHub is that, when a developer encounters a problem, there’s an excellent chance that the problem has already been described, analyzed, solved, and documented. Without GitHub, solving these protobuf and Google Mock problems would have taken me far longer. So pay it forward: if you solve an interesting problem, tell us!