How I Design Test Suites

At Zillow, I've done a lot of work on the design and development of the test infrastructure we use for full-stack tests. It's always fun to watch your tool become popular, but even more interesting is the discussions around test suite design that come with it.

Many discussions later, I have a good idea of what I want in a test suite. Here's what I think about:

Tests are a question of cost #

At the end of the day, tests have a cost. Each and every test has a value / cost ratio. Things that increase the value of a test include:

consistency: given the same inputs, give the same results, every time.
speed: the faster the test is, the faster the feedback. The faster the feedback, the faster one can take action, and the more often we can execute the tests to get feedback.

In contrast, the things that increase the cost of a test include:

maintenance time: maintenance takes time, and development time is expensive. probably the biggest cost to consider.
cpu / memory to execute the test: although arguably cheap in this world of cloud providers, cpu and memory are real concerns, and tests that use a lot of these resources are expensive.
the time to execute the test: time is a huge cost, especially as the technology world we live in demands for more changes, more quickly. Depending on how fast you ship, tests that take too long will be prohibitively expensive, and thus not used.

When I look at the value of a test, I look at these factors. In practice, I've found that the most important metric of them all is maintenance time: test that have little to no maintenance survive refactors, rewrites, and pretty much anything that could happen to code besides deprecation.

On the other hand, the more the test requires maintenance, the more likely it'll suffer one of two outcomes:

the test is thrown out because it takes too much time to maintain, despite the value.
the test is not given the time it needs, and continues to fall into disarray until it is ignored.

Basically: low maintenance tests last forever, high maintenance tests probably won't.

Designing cheap tests #

So how do we make tests that require little to no maintenance? From what I've observed, there are two types of maintenance:

functional maintenance, which modifies the test to reflect changes in the code itself

:   -   e.g. for a web page, the login form fields are modified

operational maintenance, which requires keeping a service dependency in a good state to test.

:   -   e.g. for an office application with cloud sync, keeping the
        cloud syncing service up.

Functional maintenance is unavoidable: as code changes, one must ensure that any tests that validate that code are kept up to date. In addition, for most tests, functional maintenance is relatively cheap in time: except in the cases of extreme redesigns or refactorings, the changes tend to be small in nature.

Operational maintenance costs can vary wildly, and it can become very expensive. Tests that have multiple dependencies can become a game of juggling an environment where all of those are functional. It becomes even harder if there's a small team maintaining this environment: executing the tests consistently requires a production-quality environment, and that's more difficult the more services there are to maintain.

However, unlike functional maintenance, operational maintenance, for the most part, is avoidable. Taking advantage of heavy mocking, it's possible to remove dependencies like databases and APIs. Google Testing Blog has a good article about this.

Summary: tests with fewer operational dependencies are cheaper to maintain.

What kind of test distribution: the testing pyramid #

When testing software, there are multiple levels at which one could author tests:

at the "unit" level, typically written in the same language and validating a single function or behaviour
at the integration level, typically written in the same language, and validating the communication between your code and an external application
at the end-to-end level, not necessarily written in the same language, and validating a complete workflow that a user would be performing.

Although all are important and should be included in a test suite, each test is not created equally. Going back to the idea that tests with the least maintenance will last the longest, we should be trying to have as many of those as possible.

Unit tests are the cheapest. They:

have no dependencies (or else they would at least be considered an integration test),
run quickly (no waiting for network, or other delay from communication)

If we could capture all behaviour of our application with just unit tests, that would be perfect. Unfortunately, many things can go wrong when composing multiple pieces of these units together, so some level of integration and end-to-end tests will be needed. But the larger tests should be fewer in number, since they are harder to maintain.

A good model to visualize a good distribution is the "testing pyramid", as explained by Martin Fowler and Google:

The more expensive tests are fewer in number, while the cheaper tests are much more common.

How many tests should be in a suite #

Adequate test coverage varies wildly between applications: medical software than monitors heart rate should probably have a lot more coverage than a non-critical social media website. The only common rule of thumb I've found is: add the absolute minimum number of tests to achieve your desired confidence in quality.

Testing is important, but at the end of the day, it's not a user-facing feature. On the other hand, quality is. Adding additional tests does increase quality, but it comes at the cost of development and maintenance time toward other features that help your application provide value. A properly sized testing suite comes right at the line of too little testing, and hover around that. This gives developers as much time as possible on features, while ensuring that an important feature (quality) is not neglected.

Summary #

the best tests are the cheapest tests: low maintenance and executes quickly and low CPU/RAM resources
the cheapest tests have the fewest number of dependencies on other applications, like DBs or APIs
try to keep test coverage as low level as possible, and cheap tests are worth 10x expensive ones.
expensive tests validate the whole infrastructune, so they're almost always necessary: refer to the test pyramid for a rough sketch of a good distribution.
never add more or less coverage than you need: more coverage results in more maintenance that detracts from development time, and less coverage means an application whose quality is not up to the desired standards.
how much coverage do I need? Depends on how critical the application is, and how critical it continues to work. A payment path needs high quality, so should have high coverage. The alignment of a button on a dialog three pages deep probably needs less quality assurance.

How do you design your test suite?