Thursday, February 5, 2009

Testing 1, 2, 3

Gnip received a request to go over how we "test." I hope the following is useful.

While we don't practice TDD (Test Driven Development) outright, I'd consider us in that vein. We are test heavy, and many of our tests are written before the code itself. My personal background is not test driven, but I'm a convert (thanks to the incredible team we've pulled together). While it takes self control, the satisfaction of writing a test first, then building code to meet its constraints feels great when you're done. Your goal, at whatever level the test was written at, was clearly defined at the start, and you wrote code to fulfill that need. Ecstasy! Try it yourself.

Our build process includes execution of ~1k tests. You don't checkin code if you break any of those tests, and code you checkin has new tests to validate itself. If you "break the build," that is not nice, and peer pressure will look down on you so you won't do it again.

The range of tests at Gnip are a challenge to categorize and build. Component/unit level tests are relatively straightforward, and range from class drivers, and data input/output comparisons against expected result sets. Writing tests when much of your system is unpredictable and variable is particularly challenging. Gnip works with so many different services and data formats, that getting the tests right for all the scenarios is hard. When we do come up against a new failure case, a test gets written to ensure we don't fail there again.

Given the "real-time" nature of the Gnip core platform, benchmarking the performance of individual components, as well as end-to-end data flow is fundamental. We build "micro-benchmark" tests to vet the performance of proposed implementations.

Testing end-to-end performance is done a variety of ways. We analyze timing information in system logs for introspection. We also run scripts external the system to test both the latency of our system, as well as that of the Publishers moving data into Gnip. Those tests calculate the deltas between when an activity was created, published, and ultimately consumed out the other end.

The importance of both internal testing, as well as testing external to the system cannot be over stated. Testing various internal metrics is great for internal decision making and operations, however you can lose sight of what your customers see if you only view things on your side of the fence. We use (Custom checks) to drive much of our external monitoring and reporting.

Here's some insight into our testing tool chain:
  • JUnit, EasyMock (Java)
  • HttpUnit, bash/Python scripts/cron (general API pounding)
  • unittest (Python)
  • RSpec (Ruby)

No comments: