The CID Pattern: a strategy to keep your web service code clean

The Problem

Long term maintenance of a web application, will, at some point,
require changes. Code grows with the functionality it serves, and
an increase in functionality is inevitable.

It is impossible to foresee what sort of changes are required, but there are
changes that are common and are commonly expensive:

  • changing the back-end datastore of one or more pieces of data
  • adding additional interfaces for a consumer to request or modify data

It is possible to prevent some of these changes with some foresight,
but it is unlikely to prevent all of them. As such, we can try to
encapsulate and limit the impact of these changes on other code bases.

Thus, every time I start on a new project, I practice CID: (Consumer-Internal-Datasource)

CID Explained

CID is an acronym for the three layers of abstraction that should be
built out from the beginning of an application. The layers are described as:

  • The consumer level: the interface that your consumers interact with
  • The internal level: the interface that application developers interact with most of the time
  • The datasource level: the interface that handles communication with the database and other APIs

Let’s go into each of these in detail.

Consumer: the user facing side

The client level handles translating and verifying the client format,
to something that makes more sense internally. In the beginning, this
level could be razor thin, as the client format probably matches the
internal format completely. However, other responsibilities that might
occur at this layer are:

  • schema validation
  • converting to whatever format the consumer desires, such a json
  • speaking whatever transport protocol is desired, such as HTTP or a Kafka stream

As the application grows, the internal format might change, or a new
API version may need to be introduced, with it’s own schema. At that
point, it makes sense to split the client schema and the internal
schema, so ending up with something like:

class PetV1():
    to_internal()  # converts Pet to the internal representation.
    from_internal() # in case you need to return pet objects back as V1

class PetV2():
    to_internal()  # converts Pet to the internal representation.
    from_internal()  # in case you need to return pet objects back as V2

class PetInt():
    # the internal representation, used within the internal level.

Datastore: translates internal to datastore

Some of the worst refactorings I’ve encountered are the ones involving
switching datastores. It’s a linear problem: as the database
interactions increase, so do the lines of code that are needed to
perform that interaction, and each line must be modified in switching
or alternating the way datastores are called.

It’s also difficult to get a read on where the most expensive queries
lie. When your application has free form queries all over the code, it
requires someone to look at each call and interpret the cost, as ensure
performance is acceptable for the new source.

If any layer should be abstracted, it’s the datastore. Abstracting the
datastore in a client object makes multiple refactors simpler:

  • adding an index and modifying queries to hit that index
  • switching datasources
  • putting the database behind another web service
  • adding timeouts and circuit breakers

Internal: the functional developer side

The client and datastore layers abstract away any refactoring that
only affects the way the user interacts with the application, or the
way data is stored. That leaves the final layer to focus on just the
behavior.

The internal layer stitches together client and datastore, and
performs whatever other transformations or logic needs to be
performed. By abstracting out any modification to the schema that had
to be done on the client or datastore (including keeping multiple
representation for the API), you’re afforded a layer that deals exclusively
with application behavior.

An Example of a CID application

A theoretical organization for a CID application is:

root:
  consumers:
    - HTTPPetV1
    - HTTPPetV2
    - SQSPetV1
  internal:
    # only a single internal representation is needed.
    - Pet
  datasource:
    # showcasing a migration from Postgres to MongoDB
    - PetPostgres
    - PetMongoDB

Example Where CID helps

So I’ve spent a long time discussing the layers and their
responsibilities. If we go through all of this trouble, where does
this actually help?

Adding a new API version

  • add a new API schema
  • convert to internal representation

Modifying the underlying database

  • modify the datasource client.

Complex Internal Representations

If you need to keep some details in a Postgres database, and store
other values within memcache for common queries, this can be
encapsulated in the datasource layer.

All too often the internal representations attempt to detail with this
type of complexity, which makes it much harder to understand the
application code.

Maintaining Multiple API versions

Without clearly separating how an object is structured internally from
how consumers consume it, the details of the consumer leaks into the
internal representation.

For example, attempting to support two API version, someone writes
some branched code to get the data they want. this pattern continues
for multiple parts of the code dealing with that data, until it
becomes hard to get a complete understanding of what in V1 is
consumed, and what in V2 is consumed.

Final Thoughts

David Wheeler is quoted for saying:

All problems in computer science can be solved by another level of indirection.

Indirection is handy because it encapsulates: you do not need a
complete understanding of the implementation to move forward.

At the same time, too much indirection causes the inability to
understand the complete effect of a change.

Balance is key, and using CID helps guide indirection where
it could help the most.

KeyError in self._handlers: a journey deep into Tornado’s internals

If you’ve worked with tornado, you may have encountered a traceback of
a somewhat bewildering error:

Traceback (most recent call last):
    File "/usr/local/lib/python2.7/site-packages/tornado/ioloop.py", line 832, in start
fd_obj, handler_func = self._handlers[fd]
KeyError: 16

A few other people have been confused as well. After some digging and a combination
of learning about the event loop, fork, and epoll, the answer finally entered into focus.

TLDR

If you’re looking for the solution, don’t call or start IOLoops before
an os.fork. This happens in web servers like gunicorn, as well as
tornado.multiprocess, so be aware of that caveat as well.

But why does this happen?

As I mentioned previously, this is a combination of behaviour all
across the system, python and tornado stack. Let’s start with
learning more about that error specifically.

The code the traceback is referring occurs in the the IOLoop:

# tornado/ioloop.py
self._events.update(event_pairs)
while self._events:
    fd, events = self._events.popitem()
    try:
        fd_obj, handler_func = self._handlers[fd]
        handler_func(fd_obj, events)

What are these variables? you can read the IOLoop code yourself, but effectively:

  • _handlers is a list of the callbacks that should be called once an async event is complete.
  • _events is a list of events that have occurred, that need to be handled.

What is an FD?

The handlers and events are both keyed off of file descriptors. In a
few words, file descriptors represent a handle to some open file. In
unix, a pattern has propagated where a lot of resources (devices,
cgroups, active/inactive state) are referenced via file descriptors:
it became a lingua franca for low level resources because a lot of
tooling knows how to work with file descriptors, and writing and
reading to a file is simple.

They’re useful for tornado because sockets also have a file descriptor
represent them. So the tornado ioloop could wait for an event
affecting a socket, then pass that socket to a handler when a socket
event is fired (e.g. some new data came into the socket buffer).

What modifies the events and handlers?

A KeyError handlers means there’s a key in events that is not in the
handlers: some code is causing events to be added to the ioloop, and
aren’t registering a handler for it at the same time. So how does that
happen in the code?

A good starting point is looking where _handlers and _events are
modified in the code. In all of the tornado code, there’s only a
couple places:

# tornado/ioloop.py
def add_handler(self, fd, handler, events):
    fd, obj = self.split_fd(fd)
    self._handlers[fd] = (obj, stack_context.wrap(handler))
    self._impl.register(fd, events | self.ERROR)
# tornado/ioloop.py
def remove_handler(self, fd):
    fd, obj = self.split_fd(fd)
    self._handlers.pop(fd, None)
    self._events.pop(fd, None)
    try:
        self._impl.unregister(fd)
    except Exception:
        gen_log.debug("Error deleting fd from IOLoop", exc_info=True)

Looking at these pieces, the code is pretty solid:

  • handlers are added only in add_handler, and they are added to a _impl.register
  • handlers are only removed in remove_handler, where they are removed in _events, _handlers and _impl.
  • events are added to _events in _impl.poll()

So the removing of handlers always make sure that events no longer has
it anymore, and it removes it from this impl thing too.

But what is impl? Could impl be adding fd’s for events that don’t have handlers?

impl: polling objects

It turns out _impl is chosen based on the OS. There is a little bit of
indirection here, but the IOLoop class in tornado extends a configurable object,
which selects the class based on the method configurable_default:

# tornado/ioloop.py
@classmethod
def configurable_default(cls):
    if hasattr(select, "epoll"):
        from tornado.platform.epoll import EPollIOLoop
        return EPollIOLoop
    if hasattr(select, "kqueue"):
        # Python 2.6+ on BSD or Mac
        from tornado.platform.kqueue import KQueueIOLoop
        return KQueueIOLoop
    from tornado.platform.select import SelectIOLoop
    return SelectIOLoop

And each of these loop implementations pass it’s own argument into the impl argument:

class EPollIOLoop(PollIOLoop):
    def initialize(self, **kwargs):
        super(EPollIOLoop, self).initialize(impl=select.epoll(), **kwargs)

Looking at select.epoll, it follows the interface of a polling object: a
class in the Python standard library that has the ability to poll for
changes to file descriptors. If something happens to a file descriptor
(e.g. a socket recieving data), the polling object, it will return
back the file descriptor that was triggered.

Different architectures have different polling objects
implemented. The avaialable ones in tornado by default are:

  • epoll (Linux)
  • kqueue (OSX / BSD)
  • select Windows use

In our case, this was happening on Linux, so we’ll look at epoll.

epoll

So what is epoll? It’s documented in the Python standard library, but
it’s a wrapper around the epoll Linux system calls.

The ioloop code actually looks like:

  • wait for epoll to return a file descriptor that has an event
  • execute the handler (which will presumably register another handler if another step is required, or not if it’s complete)
  • repeat.

epoll has two different configurations, but the one tornado uses is
edge-polling: it only triggers when a CHANGE occurs, vs when a
specific level is hit. In other words, it will only trigger when new
data is available: if the user decides to do nothing with the data,
epoll will not trigger again.

epoll works by registering file descriptors for the epoll object to
listen to. You can also stop listening to file descriptors as well.

So epoll works great for an event loop. But is it possible to somehow
register file descriptors to the epoll/impl object without using the
method above?

epoll and os.fork

It isn’t possible to register things outside of the impl
object. But, os.fork can cause some weird behaviour here. See, the way
that one interfaces with epoll is using file descriptors: you have an
fd to the epoll object, and you can use Linux system calls to work
with that:

As mentioned previously, file descriptors is a common way to reference
some object when using Linux kernel system calls.

Another common system call is fork. The
documentation of fork specifies that fork is equivalent to:

  • copying the memory of the current process to a new space
  • spawning a new process that uses the new copy.

This is fine for most objects in memory, but how about file
descriptors, which reference some object outside of the memory space
of the current process.

In the case of file descriptors, the file descriptor is also cloned to
the new fork. In other words, both the parent and the child process
will have a reference to the same file descriptor.

So, what does this mean for epoll, which is just another file
descriptor under the hood? Well, you can probably guess.

It gets shared.

How the bug works

So this is the crux of the issue. When an os.fork occurs, the parent
and the child share the SAME epoll. So for an IOLoop that is created
by the parent object, the child process uses the same epoll as well!

So, that allows a condition like this:

  1. parent creates an IOLoop loop_1, with an epoll epoll_1
  2. parent calls os.fork, creating loop_2, which shares the same epoll_2
  3. parent starts ioloop, waits for epoll_1.poll()
  4. child adds a handler for fd_2 to epoll_1
  5. parent gets back fd_2, but doesn’t have a handler for it, and raises the KeyError.

So this will pretty much happen at some point anytime a new ioloop is not created for a child process.

Here’s a repro script. I couldn’t figure out a good way to kill this
gracefully, so be warned this will need to be killed externally.

import logging
import select
import socket
import os
import time
import tornado.ioloop
import tornado.httpclient
import tornado.web

serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serversocket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
serversocket.bind(('127.0.0.1', 8080))
serversocket.listen(1)

logging.basicConfig()

loop = tornado.ioloop.IOLoop.current()

if os.fork():
    handler = lambda *args, **kwargs: None
    loop.add_handler(serversocket.fileno(), handler, select.EPOLLIN)
    time.sleep(0.1)
    client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    client.connect(('127.0.0.1', 8080))
    client.send(b"foo")
else:
    loop.start()

How about gunicorn or tornado.multiprocess?

So how to avoid this in gunicorn or tornado.multiprocess, which uses
an os.fork? The best practice is to not start the ioloop until AFTER
the fork: calling ioloop.Instance() or current() will create an ioloop whose ioloop will be shared
by any child ioloop, without explicitly clearing it.

Gunicorn calls a fork as it’s spawning a worker:

# gunicorn/arbiter.py
def spawn_worker(self):
    self.worker_age += 1
    worker = self.worker_class(self.worker_age, self.pid, self.LISTENERS,
                               self.app, self.timeout / 2.0,
                               self.cfg, self.log)
    self.cfg.pre_fork(self, worker)
    pid = os.fork()
    if pid != 0:
        self.WORKERS[pid] = worker
        return pid

Summary

Tornado is an awesome framework, but it’s not simple. However, thanks
to well documented pieces, it’s possible to diagnose even complex
issues like this, and do a bit of learning along the way.

Also, os.fork is not a complete guarantee that you’ll get a unique
instance of every object you use. Beware file descriptors.

Introducing transmute-core: quickly create documented, input validating APIs for any web framework

A majority of my career has been spent on building web services in
Python. Specifically, internal ones that have minimal or no UIs, and
speak REST (or
at least are rest-ish).

With each new service, I found myself re-implementing work to
make user-friendly REST APIs:

  • validation of incoming data, and descriptive errors when a field does not
    match the type or is otherwise invalid.
  • documenting said schema, providing UIs or wiki pages allowing users to
    understand what the API provides.
  • handling serialization to and from multiple content types (json, yaml)

This is maddening work to do over and over again, and details are
often missed: sometimes yaml is not supported for a particular API, or
there is a specific field that is not validated. Someone will ask about
an API that you changed, and forgot to document a new parameter. It’s hard to
scale API maintenance when you’re dealing with forgetting some minute boilerplate.

This was further exacerbated by using different web frameworks for
different projects. Every framework provides their own REST plugin or
library, and often there’s a lack of functional parity, or declaring
an API is completely different and requires learning multiple
approaches.

So with this monumental pain, what if I told you can get an API that:

  • validates incoming data types
  • supports multiple content types
  • has a fully documented UI

Just by writing a vanilla Python function? And what if I told you
this can work for YOUR Python framework of choice in 100 statements
of Python code?

Well, that’s what the transmute framework is.

How it works

transmute-core is
a library that provides tools to quickly implement rest APIs. It’s
designed to be consumed indirectly, through a thin layer that adapts
it to the style of the individual framework.

HTTP Endpoints

Here is an example of a GET endpoint in flask:

import flask_transmute

# flask-like decorator.
@flask_transmute.route(app, paths='/multiply')
# tell transmute what types are, which ensures validations
@flask_transmute.annotate({"left": int, "right": int, "return": int})
# the function is a vanilla Python function
def multiply(left, right):
    return left * right

And one in aiohttp, the web framework that uses Python 3’s asyncio:

import aiohttp_transmute

@aiohttp_transmute.describe(paths='/multiply')
# tell transmute what types are, which ensures validations
# Python3.5+ supports annotations natively
#
# request is provided by aiohttp.
def multiply(request, left: int, right: int) -> int:
    return left * right

aiohttp_transmute.route(app, multiply)

Both do the following:

  • generate a valid route in the target framework
  • detect the content type (yaml or json, and parse the body)
  • verify that input parameters match the parameters specified. return a 400 status
    code an details if not.
  • write back yaml or json, depending on the content type

Note that we don’t have to deal with the content type serialization,
read from request objects, or returning a valid response object:
that’s all handled by transmute. This keeps the functions cleaner in
general: it looks similar to any other Python function.

Complex Schemas via Schematic (or any validation framework)

Primitive types in the parameters are OK, but it’s often true that
more complex types are desired.

Schema declaration and validation has multiple solutions
already, so transmute defers this other libraries. By default transmute uses
schematics.:

from schematics.models import Model
from schematics.types import StringType, IntType

class Card(Model):
    name = StringType()
    price = IntType()


# passing in a schematics model as the type enables
# validation and creation of the object when converted
# to an API.
@annotate({"card": Card})
def submit_card(card):
    db.save_card(card)

Of course, some may prefer other solutions like marshmallow. In that
case, transmute-core provides a transmute-context for users to customize and use
their own implementation of transmute’s serializers:

from transmute_core import TransmuteContext, default_context

context = TransmuteContext(serializers=MySerializer())

route(app, fn, context=context)

# alternatively, you could modify the default context directly
# (be careful about where this code is called: it needs
# to happen before any routes are constructed)
default_context.serializers = MySerializer()

Documentation via Swagger

Swagger / OpenAPI allows one to define a REST API using json. Transmute generates
swagger json files based on the transmute routes added to an app, and transmute-core provides the static CSS and JavaScript
files required to render a nice documentation interface for it:

from flask_transmute import add_swagger

# reads all the transmute routes that have been added, extracts their
# swagger definitions, and generates a swagger json and an HTML page that renders it.
add_swagger(app, "/swagger.json", "/swagger")

This also means clients can be auto-generated as well: swagger has a
large number of open source projects dedicated to parsing and
generating swagger clients. However, I haven’t explored this too
deeply.

Lightweight Framework Implementations

Earlier in this post, it is mentioned that there should a wrapper
around transmute-core for your framework, as the style of how to add
routes and how to extract values from requests may vary.

A goal of transmute was to make the framework-specific code as thin as
possible: this allows more re-use and common behavior across the
frameworks, enabling developers across frameworks to improve
functionality for everyone.

Two reference implementations exist, and they are very thin. As of this writing, they are at:

  • flask-transmute: 166 lines of code, 80 statements
  • aiohttp-transmute: 218 lines of code, 103 statements (a little bloated to support legacy APIs)

A one-page example for flask integration is also provided, to
illustrate what is required to create a new one. That’s 200 LOC with
comments, a little more than 100 without.

http://transmute-core.readthedocs.io/en/latest/creating_a_framework.html

Impressions

Frameworks are always a means to an end: it’s about reducing that
effort between what you want to build and actually building it.

I love great, well designed APIs. And dealing with the minutiae of
some detail I missed in boilerplate content type handling or object
serialization was draining the enjoyment out of authoring them. Since
I’ve started using transmute for all of my projects, it’s let me focus
on what I care about most: actually writing the functional code, and
designing the great interfaces that let people use them. For the most part,
it feels like just writing another function in Python.

The auto-documentation is freeing from both sides: as an author I can
keep my documentation in line with my implementation, because my
implementation is the source. For consumers, they’re immediately
provided with a simple UI where they can rapidly iterate with the API
call they would like to make.

It’s also great knowing I can use transmute in the next framework,
whatever that may be: I can take all the work and behavior that’s
embedded in transmute, with a module or two’s worth of code.

Conclusion

Give it a shot! Issues
and PRs
are welcome, and I’d love to see someone apply transmute to
another framework.

Global logging with flask

As of December 2016, Flask has a built-in
logger that it instantiates for you. Unfortunately, this misses the
errors and other log messages in other libraries that may also be
valuable.

It would be nice to have a single logger, one that captures BOTH
library AND app logs. For those that want a global logger, this may
take a few concept to get right. You have to:

  1. undo flask’s logging
  2. set up your own logging
  3. set log levels, as the default may not suit you.

Combined, this ends up looking like:

import logging
import sys
from flask import Flask, current_app

LOG = logging.getLogger("my_log")
LOG2 = logging.getLogger(__name__ + ".toheunateh")
app = Flask(__name__)


@app.route("/")
def route():
    current_app.logger.info("flask logger: foo")
    LOG.info("log: foo")
    LOG2.info("log2: foo")
    return "hello!"


# create your own custom handler and formatter.
# you can also use logging.basicConfig() to get
# the python default.
out_hdlr = logging.StreamHandler(sys.stdout)
fmt = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
out_hdlr.setFormatter(fmt)
out_hdlr.setLevel(logging.INFO)
# append to the global logger.
logging.getLogger().addHandler(out_hdlr)
logging.getLogger().setLevel(logging.INFO)
# removing the handler and
# re-adding propagation ensures that
# the root handler gets the messages again.
app.logger.handlers = []
app.logger.propagate = True
app.run()

And you get the right messages. Voila!

Hierarchal Naming

One of the most interesting artifacts of most programming languages using English conventions is variable naming. Today I contend that:

English Grammar is a Terrible Programming Default

Consider how you would specify that a room is for guests in English,
or a car is designed to be sporty. In both cases, the specifier comes
before the object or category:

  • Sports Car
  • Guest Room
  • Persian Cat

Since programming languages are primarily based on English, it’s a natural default to name your variables in a similar order:

  • PersianCat
  • TabbyCat
  • SiameseCat

To further qualify your classes, one prepends additional information:

  • RedTabbyCat
  • BlueTabbyCat
  • BlackTabbyCat

And the pattern continues. As more qualifiers are added, the more names are prepended.

This reads well, if our main goal was to make software read as close
to english as possible. However, software has a goal that’s more
important than grammatical correctness: organization and searchability.

Naming should have qualifiers last

Consider instead appending qualifying variables to the end, as with a namespace:

  • CatPersian
  • CatTabby
  • CatSiamese
  • CatTabbyRed
  • CatTabbyBlue
  • CatTabbyBlack

It’s still legible as an English speaker: it’s clear the adjectives are inverted. It also provides a couple other advantages too:

Sortability

If you sorted all class names next to each other, the groupings would happen naturally:

  • CatTabbyRed
  • CatTabbyBlue
  • CatTabbyBlack
  • Truck
  • PimentoLoaf

In contrast to the previous example:

  • BlueTabbyCat
  • BlackTabbyCat
  • PimentoLoaf
  • RedTabbyCat
  • Truck

Clear correlation while scanning

If you’re trying to look through a table of values quickly,
using the reverse-adjective writing shows a clear organization, even when unsorted.

  • CatTabbyBlue
  • PimentoLoaf
  • CatPersion
  • Truck
  • CatTabbyRed

In contrast to:

  • BlueTabbyCat
  • PimentoLoaf
  • PersianCat
  • Truck
  • RedTabbyCat

Conclusion

Our variable naming convention wasn’t deliberate: it was an artifact
of the language that it was modeled against. Let’s adopt conventions that
come from a logical foundation. Like more search-friendly ordering of class qualifiers.

Test Classes Don’t Work

Test Classes don’t work as a test structure.

It’s worth clarifying what I mean by the test class. I’m
speaking specifically about the following structure of an test:

  • having a test class, that contains the setup and teardown method for test fixtures
  • putting multiple tests in that class
  • having the execution of a test look something like:
    * run setup
    * execute test
    * run teardown

More or less, something like:

class TestMyStuff:

     def setUp(self):
         self.fixture_one = create_fixture()
         self.fixture_two = create_another_fixture()

     def tearDown(self):
         teardown_fixture(self.fixture_one)
         teardown_fixture(self.fixture_two)

     def test_my_stuff(self):
         result = something(self.fixture_one)
         assert result.is_ok

This pattern is prevalent across testing suites, since they follow the
XUnit pattern of test design.

Why Test Classes are the Norm

Removing the setup and teardown from your test fixtures keep things
clean: it makes sense to remove them from you test body. When looking at code,
you only want to look at context that’s relevant to you, otherwise it’s harder
to identify what should be focused on:

def test_my_stuff():
    fixture = create_fixture()

    try:
        result = something(fixture)
        assert result.is_ok
    finally:
        teardown_fixture(fixture)

So, it makes sense to have setup and teardown methods. A lot of the
time, you’ll have common sets of test fixtures, and you want to share
them without explicitly specifying them every time. Most languages
provide object-oriented programming, which allows state that is
accessible by all methods. Classes are a good vessel to give a test
access to a set of test fixtures.

When You Have a Hammer…

The thing about object oriented programming is, it’s almost always a
single inheritance model, and multiple inheritance gets ugly
quickly. It’s not very easy to compose test classes together. In the
context of test classes, why would you ever want to do that?

Test fixtures. Tests depend on a variety of objects, and you don’t
want to have to multiple the setup of the same test fixtures across
multiple classes. Even when you factor it out, it gets messy quick:

class TestA():
    def setUp(self):
        self.fixture_a = create_fixture_a()
        self.fixture_b = create_fixture_b()

    def tearDown(self):
        teardown_fixture(self.fixture_a)
        teardown_fixture(self.fixture_b)

    def test_my_thing(self):
        ...


class TestB():
    def setUp(self):
        self.fixture_b = create_fixture_b()

    def tearDown(self):
        teardown_fixture(self.fixture_b)

    def test_my_other_thing(self):
        ...

class TestB():
    def setUp(self):
        self.fixture_c = create_fixture_b()
        self.fixture_b = create_fixture_c()

    def tearDown(self):
        teardown_fixture(self.fixture_b)

    def test_my_other_other_thing(self):
        ...

At this rate, a test class per test would become necessary, each with
the same code to set up and teardown the exact same fixture.

To avoid this, there needs to be a test system that:

  • has factories for test fixtures
  • as little code as possible to choose the fixtures necessary, and to
    clean them up.

A Better Solution: Dependency Injection

In a more general sense, a test fixtures is a dependency for a
test. If a system existed that handled the teardown and creation of
dependencies, it’s possible to keep the real unique logic alone
in the test body.

Effectively, this is the exact description of a dependency injection
framework
:
specify the dependencies necessary, and the framework handles the
rest.

For Python as an example, py.test has this capability. I declare a common fixture
somewhere, and can consume it implicitly in any test function:

# example copied from the py.test fixture page.
import pytest

@pytest.fixture
def smtp(request):
    import smtplib
    server = smtplib.SMTP("merlinux.eu")
    # addfinalizer can be used to hook into the fixture cleanup process
    request.addfinalizer(lambda: teardown(server))

def test_ehlo(smtp):
    response, msg = smtp.ehlo()
    assert response == 250
    assert 0 # for demo purposes

With pytest, You can even use fixtures while generating other fixtures!

It’s a beautiful concept, and a cleaner example of how test fixtures
could be handled. No more awkward test class container to handle creation
and teardown of fixtures.

As always, thoughts and comment are appreciated.

How I Design Test Suites

At Zillow, I’ve done a lot of work on the design and development of
the test infrastructure we use for full-stack tests. It’s always fun
to watch your tool become popular, but even more interesting is the
discussions around test suite design that come with it.

Many discussions later, I have a good idea of what I want in a test suite.
Here’s what I think about:

Tests are a question of cost

At the end of the day, tests have a cost. Each and every test has a
value / cost ratio. Things that increase the value of a test include:

  • consistency: given the same inputs, give the same results, every time.
  • speed: the faster the test is, the faster the feedback. The faster
    the feedback, the faster one can take action, and the more often we
    can execute the tests to get feedback.

In contrast, the things that increase the cost of a test include:

  • maintenance time: maintenance takes time, and development time is expensive.
    probably the biggest cost to consider.
  • cpu / memory to execute the test: although arguably cheap in this world
    of cloud providers, cpu and memory are real concerns, and tests that use
    a lot of these resources are expensive.
  • the time to execute the test: time is a huge cost, especially as the
    technology world we live in demands for more changes, more
    quickly. Depending on how fast you ship, tests that take too long will
    be prohibitively expensive, and thus not used.

When I look at the value of a test, I look at these factors. In
practice, I’ve found that the most important metric of them all is
maintenance time: test that have little to no maintenance survive
refactors, rewrites, and pretty much anything that could happen to
code besides deprecation.

On the other hand, the more the test requires maintenance, the more likely
it’ll suffer one of two outcomes:

  • the test is thrown out because it takes too much time to maintain,
    despite the value.
  • the test is not given the time it needs, and continues to fall into
    disarray until it is ignored.

Basically: low maintenance tests last forever, high maintenance tests probably won’t.

Designing cheap tests

So how do we make tests that require little to no maintenance? From what I’ve observed, there are two types of maintenance:

  • functional maintenance, which modifies the test to reflect changes in the code itself
    • e.g. for a web page, the login form fields are modified
  • operational maintenance, which requires keeping a service dependency in a good state to test.
    • e.g. for an office application with cloud sync, keeping the cloud syncing service up.

Functional maintenance is unavoidable: as code changes, one must
ensure that any tests that validate that code are kept up to date. In
addition, for most tests, functional maintenance is relatively cheap
in time: except in the cases of extreme redesigns or refactorings, the
changes tend to be small in nature.

Operational maintenance costs can vary wildly, and it can become very
expensive. Tests that have multiple dependencies can become a game of
juggling an environment where all of those are functional. It becomes
even harder if there’s a small team maintaining this environment:
executing the tests consistently requires a production-quality
environment, and that’s more difficult the more services there are to
maintain.

However, unlike functional maintenance, operational maintenance, for
the most part, is avoidable. Taking advantage of heavy mocking, it’s
possible to remove dependencies like databases and APIs. Google
Testing Blog has a good article about
this
.

Summary: tests with fewer operational dependencies are cheaper to maintain.

What kind of test distribution: the testing pyramid

When testing software, there are multiple levels at which one could author tests:

  • at the “unit” level, typically written in the same language and validating a single function or behaviour
  • at the integration level, typically written in the same language, and validating the communication between your code and an external application
  • at the end-to-end level, not necessarily written in the same language, and validating a complete workflow that a user would be performing.

Although all are important and should be included in a test suite,
each test is not created equally. Going back to the idea that tests
with the least maintenance will last the longest, we should be trying
to have as many of those as possible.

Unit tests are the cheapest. They:

  • have no dependencies (or else they would at least be considered an integration test),
  • run quickly (no waiting for network, or other delay from communication)

If we could capture all behaviour of our application with just unit
tests, that would be perfect. Unfortunately, many things can go wrong
when composing multiple pieces of these units together, so some level
of integration and end-to-end tests will be needed. But the larger
tests should be fewer in number, since they are harder to maintain.

A good model to visualize a good distribution is the “testing pyramid”, as explained
by Martin Fowler and Google:

The more expensive tests are fewer in number, while the cheaper tests
are much more common.

How many tests should be in a suite

Adequate test coverage varies wildly between applications: medical
software than monitors heart rate should probably have a lot more
coverage than a non-critical social media website. The only common
rule of thumb I’ve found is: add the absolute minimum number of tests
to achieve your desired confidence in quality.

Testing is important, but at the end of the day, it’s not a
user-facing feature. On the other hand, quality is. Adding additional
tests does increase quality, but it comes at the cost of development
and maintenance time toward other features that help your application
provide value. A properly sized testing suite comes right at the line
of too little testing, and hover around that. This gives developers
as much time as possible on features, while ensuring that an
important feature (quality) is not neglected.

Summary

  • the best tests are the cheapest tests: low maintenance and executes quickly and low CPU/RAM resources
  • the cheapest tests have the fewest number of dependencies on other applications, like DBs or APIs
  • try to keep test coverage as low level as possible, and cheap tests are worth 10x expensive ones.
  • expensive tests validate the whole infrastructune, so they’re almost
    always necessary: refer to the test pyramid for a rough sketch of a good distribution.
  • never add more or less coverage than you need: more coverage results
    in more maintenance that detracts from development time, and less coverage means an application
    whose quality is not up to the desired standards.
  • how much coverage do I need? Depends on how critical the application
    is, and how critical it continues to work. A payment path needs high
    quality, so should have high coverage. The alignment of a button on
    a dialog three pages deep probably needs less quality assurance.

How do you design your test suite?

Book Report: Refactoring by Martin Fowler

Refactoring is a book covering the basics tenants of refactoring as
dictated by Martin Fowler: a very smart person with some very good
ideas about code in general.

First, the interesting thing about the definition of refactoring (as
defined by this book) is that it doesn’t encompass all code
cleanup. It explicitly defines refactoring as a disciplined practice
that involves:

  • a rigorous test suite to ensure code behaves as desired beforehand.
  • a set of steps that ensures that, at every step, the code works as before.

There’s a lot of gems in this book. ‘Refactoring’ not only covers the
basic tenants around refactoring, but also provides a great set of
guidelines around writing code that is very easy for future
maintainers to understand as well.

The Indicators for Refactoring

After showing a great example of a step-by-step refactoring of code
that excellently preserves functionality, the next chapter describes
several code smells that indicate the need for a refactor:

  • duplicate code: a common red flag for anyone familiar with the age
    old adage DRY (Don’t repeat yourself)
  • long methods: definitely a good sign for a refactor. I can’t recall
    how many methods I’ve read where I’ve barely been able to keep mental track
    of what’s really going on here.
  • strong coupling: Definitely not an easy one to catch when you’re
    hacking away hardcore at something. Sometimes it takes a real objective look at
    your code to find that the two classes or methods that you’ve been working with
    should really be one, or maybe organized separately.

Aside from this, the book explicitly describes several situations
which indicate the need to consider refactoring. That said (and Martin
also admits this), it’s not even close to outlining every single
situation where refactoring is necessary. After all, programming,
despite requiring a very logical and objective mind, can be a very
subjective practice.

The Actual Refactorings

After going over the smells, the next chapters finally describe the
actual refactoring themselves. The description of the refactoring
themselves is very rigorous, covering motivation, explicit steps and
examples. It’s a very good reference to cover all of your bases, and
like any book that describes patterns, is a good reference to keep
somewhere when tackling particularly difficult refactoring tasks.

A lot of the refactors were ones I was already familiar with, but
there were some interesting cases I didn’t really think a lot about, that
‘Refactoring’ helped me assess more deeply:

Replace Temp with Query

The summary of this description is to replace temporary variables with
a method that generates the state desired:

def shift_left(digits, value):
    multiplier = 2 ** digits
    return value * multiplier

After:

def shift_left(digits, value):
    return value * _power_of_two(digits)

def _power_of_two(digits):
    return 2 ** digits

This is a trivial example, and not necessarily representative of a
real refactoring. However, using a ‘query method’ to generate state
helps prevent several bad patterns from emerging:

  • modifying the local variable to be different than the initial intention
  • ensure that the variable is not misused anywhere else

It’s a good example of a refactoring that help ensure the variable is
actually temporary, and is not misused.

Introduce Explaining Variable

At the end of the day, good code is 90% about making it easier for
others to read. Code that works is great, but code that can not be
understood or maintained is not going to last when that code is
encountered a second time.

Explaining variables really help here. This is the idea of making
ambiguous code more clearer by assigning results to named variables that
express the intent a lot better:

def interest(amount, percentage, period):
    return amount * (1.414 ** (percentage / period))

After:

def interest(amount, percentage, period):
    e_constant = 1.414
    return amount * Ce_constant ** (percentage / period))

Having very descriptive variables can make understanding the code a
lot easier.

Remove Assignment to Parameters

This is saying basically avoid mutating input parameters:

def multiply(x, y):
    x *= y
    return x

After:

def multiply(x, y):
    result = x * y
    return result

This is nice because it makes it easier to work with input parameters
later: mutating values that have clear intent can result to poor
misuse of those variables later (because you assume no one changed it,
or it actually describes the value it should). This could be
inefficient, but compiler optimizers can get rid of these
inefficiencies anyway, so why make it more confusing to a potential
consumer?

Duplicate Observed Data

This is basically pushing for a decoupling of data stored on both a
client (interface) as well as a publisher. There’s a lot of times
where the client will store data that’s almost identical to an object
that already exists and has all the information stored neatly. Reducing the
duplication of data is always a good thing.

Separate Query from Modifier

There’s a lot of methods that not only perform formatting or retrieve
data, but also mutate data as well. This refactoring suggests
separating them:

def retrieve_name(log_object):
    log_object.access_count += 1
    return [str(x) for x in log_object.names]

After:

def increment_access_count(log_object):
  log_object.access_count += 1

def retrieve_name(log_object):
  return [str(x) for x in log_object.names]

increment_access_count(log_object)
return retrieve_name(log_object)

I can’t count the number of times I wanted to have one specific part
of the function a function performs. Refactorings such as this one
really give modular pieces that can be stitched together as necessary.

The General Refactoring Principles

The book’s scatters some great gems about what a good refactoring
looks like, and it’s very similar to what is commonly known to be good
code:

  • mostly self-documenting: code that is so easily legible that it your
    barely even need comments to understand what it’s doing: intelligible
    variable and function names, written like plain English more that code.
  • modular: each function is split into small, singularly functional units.
  • taking advantage of the principles and idioms for the language at
    hand: ‘refactoring’ was written with object-oriented languages in
    mind, so it advocated strong utilization of OOP. Utilize the
    programming language’s strengths.

Any step that takes your code in that direction (whilst preserving
functionality) is a good example of a refactoring.

How to Allocate Time to Refactor

‘Refactoring’ also stresses and appropriate time to refactor code:
constantly. Martin Fowler argues refactoring should occur during the
development process, and time should be added to estimates to give
space for refactoring. I’ve never been given explicit amounts of time
to refactor code, and most of the time, you won’t. Best thing to do is
to push yourself to refactor whenever it’s appropriate. The book also
warns against going overboard, only refactoring what you need. It’s a very
agile approach to the idea of refactoring.

Conclusion

Ultimately, ‘Refactoring’ doesn’t blow my mind and introduce me to
some life-changing concept. That said, it definitely changed my
mindset about refactoring. Refactoring should:

  • be done as you go
  • move the code toward being easily comprehensible
  • move the code toward being easily extendable
  • have a strong set of testing around it to preserve functionality

As I was about to tackle a fairly large refactoring, It was a great
read to organize my thoughts about my methodologies and practices, and
my goals.

I don’t recommend reading every word, but the chapters that explain
philosophies and glancing over the refactoring patters was more that
worth the time spent reading.

The Dangers of Patching

If you’ve ever used Mock (or
the built-in mock in python
3
), you’ll
know how powerful of a tool it can be toward making unit testing on
functions modifying state sane. Mocks in Python are effectively a probe
that you can send into a deep, dark function:

import mock

def test_write_hello_world():
    my_filehandle = mock.Mock()
    write_hello_world_to_handle(my_filehandle)
    my_filehandle.write.assert_called_with("hello world")

You can send in a fake object, have it experience what it’s like to be
a real object, and you can ask it questions about what is was like.

The above example doesn’t really test a lot, but for more complex
cases, it can be a lifesaver: you know exactly what was called and
what wasn’t, and if your object modifies some real world state that
you don’t want to (such as a database), it prevents you
from performing dangerous operations.

Another well-known feature of the mock module is patch: a function that
gives you the ability to replace any object in python (in any module)
with a mocked object. An example usage is like this:

import mock

def test_linux():
    with mock.patch('platform.system') as system:
        system.return_value = 'Linux'
        import platform
        assert platform.system() == 'Linux'

Patch is powerful: it actually lets you replace modules, functions, and
values, even if they’re not imported in the current context!

But just because a tool is powerful, doesn’t mean you should use
it. In reality, patch should be a last resort: you should only use it
if there’s no other way to test your code.

But why? Patch is basically making mock even more flexible: you can
literally mock anything you are aware of exists. There’s a couple glaring issues:

It’s not foolproof

Let’s say I have a couple files like this:

# mock_test.py

from mymodule import is_my_os
try:
    from unittest import mock  # py3
except ImportError:
    import mock  # py2

with mock.patch('platform.system', return_value="my os"):
    assert is_my_os()
# mymodule.py
from platform import system

def is_my_os():
    return system() == "my os"

Now patch is patching the platform.system function, so this should pass. Let’s try it:

$ python mock_test.py
Traceback (most recent call last):
  File "./bin/python", line 42, in <module>
    exec(compile(__file__f.read(), __file__, "exec"))
  File "/Users/tsutsumi/sandbox/mock_test.py", line 11, in <module>
assert is_my_os()
    AssertionError

That’s not what we expected! So what happened here?

Internally, every python module contains it’s own scope. Every import,
method declaration, and variable declaration, and expression modifies
that scope in someway. So when you import anything, you are actually
adding in a reference to that object into the global scope. So by the
time we actually mock ‘platform.system’, the module’s ‘platform’
already contains a reference to the ‘system’ function:

$ python
>>> import platform
>>> from platform import system
>>> import mock
>>> with mock.patch('platform.system') as mock_system:
...     print(mock_system)
...     print(system)
...     print(platform.system)
...
<MagicMock name='system' id='4307612752'>
<function system at 0x100bf9c80>
<MagicMock name='system' id='4307612752'>
>>>

So even if you do patch a method, you won’t necessarily patch all the
uses of that method, depending on how they’re imported in. This
means your patching must directly match how the object you want to
mock is imported into the code to test.

For example, we can fix the mock_test.py file above by changing the patch:

# mock_test.py

from mymodule import is_my_os
try:
    from unittest import mock  # py3
except ImportError:
    import mock  # py2

with mock.patch('mymodule.system', return_value="my os"):
    assert is_my_os()

So in order to use a patch effectively, you have to be aware of exact
semantics
by which a method is both imported an invoked. And this
leads up to the ultimate problem with patch:

Really tightly coupling tests with implementation

Patching in general, regardless of the implementation, tightly couples
your test code and your regular code beyond the typical bounds of unit
testing. Once you get patching involved, you have to not only be
conscious of the effect of your code, but also it’s
implementation. Modifying the internal code of the method also
requires modifying the test code. If your unit tests change, the
actual functionality it’s testing is also changed: you’re no longer
guaranteed that your code is identical because the same tests pass:
because modifying your code requires you to change your test code.

Ultimately however, we don’t live in an ideal world. Times will come
when you have to test code that is hard to refactor into a method that
works with only mocks or actual objects. But with code you control,
it’s almost completely avoidable.

So how do we avoid patching?

Patching is the result of coupled complex state, relying on multiple
global variables. We can remedy this by doing the exact opposite:

  • decouple complex state
  • don’t rely on global variables

Let’s take a look at some practices to help with this:

Don’t use global variables

for example, let’s look at an object that creates a persistent db
connection based on configuration parameters:

db_connection = db_connect(DB_URL)

class MyObject:

    def __init__(self, name):
        self.name = name

    def save(self):
        db_connection.write(self.to_dict())

    def to_dict():
        return { 'name': self.name }

To test this object’s save method, you would have either patch the
db_connection object, or replace the DB_URL to reflect a test
database. Either method is an extra step from testing what you really
want on just the save method: the db method is called, and is passed the
dictionary representation of the object.

You can accomplish this without patch by passing in objects as you
need them: by explicitly passing them in, it makes it really easy to mock:

class MyObject:

    def __init__(self, name):
        self.name = name

    def save(self, db):
        db.write(self.to_dict())

    def to_dict():
        return { 'name': self.name }

 def test_myobject_save():
     import mock
     my_object = MyObject("foo")
     db = mock.Mock()
     my_object.save(db)
     assert db.write.assert_called_with({
         'name': 'foo'
     })

Decouple complex state

Complex state coupling occurs when you attempt to hide a lot of the
difficulty with creating objects from a user. Using the database above, as an example:

class MyObject:

    def __init__(self, db_url, name):
        self._db = db_connection(db_url)
        self.name = name

    def save(self):
        self._db.write(self.to_dict())

    def to_dict():
        return { 'name': self.name }

Now the only way to actually test this save method (aside from a full
stack test) is to mock the db_connection method. It wouldn’t work to
assign the db attribute afterward (my_object._db = Mock()) because
this would mean that the objects was already instantiated: your db
connection already exists, creating extra overhead you won’t used.

Instead of trying to hide the complex state from the user of your
class, let them actually choose the db object to pass in:

class MyObject:

    def __init__(self, db, name):
        self._db = db
        self.name = name

    def save(self):
        self._db.write(self.to_dict())

    def to_dict():
        return { 'name': self.name }

 def test_myobject_save():
     import mock
     db = mock.Mock()
     my_object = MyObject(db, "foo")
     my_object.save()
     assert db.write.assert_called_with({
         'name': 'foo'
     })

This not only allows us to test operations on complex objects, but
also makes the class more flexible as well (e.g. compatible with more
db objects than just the one that db_connection returns)

Final thoughts

Once again, patch exists for a reason. It’s almost like a magic wand
that allows you to test otherwise untestable code. But this magic wand
comes with making your life harder the more you use it.

So all in all: beware the dangers of patching.

Getting Dropbox Status’s into Conky + Dzen2

I’m an avid xmonad user, and I’ve recently switched over to conky +
dzen as my status bar. A recent issue I had is with getting Dropbox
status information into my conky.

I did some hacking and here’s the result. I love the way it turned out:

This is a pretty generic approach on adding anything into conky +
dzen. Here’s the steps I took:

1. Write some scripts to produce the text you want

Conky has methods to run arbitrary scripts and echo their
output. This abstraction makes it easy to get the text you want.

I started writing a couple shell scripts that get me the info I need:

Note: I used the Dropbox command line tool to get this info. You'll
need that installed. on arch, it's the 'dropbox-cli' package.
# drobox-down
# echos the Dropbox download speed
#!/usr/bin/env bash

status=`dropbox status | grep Downloading`
SYNC_REGEX="([0-9,]+) KB/sec"

[[ $status =~ $SYNC_REGEX ]]
download_speed="${BASH_REMATCH[1]}"
if [[ $download_speed != "" ]] ; then
  echo "$download_speed KB/sec"
fi
# drobox-up
# echos the Dropbox upload speed
#!/usr/bin/env bash

status=`dropbox status | grep Uploading`
SYNC_REGEX="([0-9,]+) KB/sec"

[[ $status =~ $SYNC_REGEX ]]
upload_speed="${BASH_REMATCH[1]}"
if [[ $upload_speed != "" ]] ; then
  echo "$upload_speed KB/sec"
fi
#!/usr/bin/env bash
# Dropbox-files
# lists a single filename if only a single file is being synced
# otherwise, echos the number of files synced

status=`dropbox status | grep Syncing`
SYNC_REGEX="([0-9,]+) files remaining"
FILENAME_REGEX='"(.*)"'

[[ $status =~ $SYNC_REGEX ]]
files_remaining="${BASH_REMATCH[1]}"
if [[ $files_remaining == "" ]]; then

    [[ $status =~ $FILENAME_REGEX ]]
    filename="${BASH_REMATCH[1]}"
    echo $filename

else
    echo "$files_remaining files"
fi

3. Add them to your conky script

Now that we have our shell scripts, and our icons, you can execute them in your conky
script. I got the arrows from the nice icon set.
If you’re lazy you can also get them from my rc files.

Once you have all your assets, add in the relevant pieces into your conky:

out_to_console yes
out_to_x no
update_interval 1

lua_load $HOME/.xmonad/conky_scripts/conky_lua_scripts.lua

# note: Dropbox needed dropbox-cli on arch

TEXT
# ---- START DROPBOX STUFF ---
^fg(\#007ee5) ^i($HOME/.xmonad/icons/Dropbox.xbm) \
# ---- description of files changing ---
^fg(\#FFFF00) ${execi 6 $HOME/.xmonad/conky_scripts/Dropbox-files} ^fg()\
# ---- download speed info ---
^fg(\#8888FF) ^i($HOME/.xmonad/icons/net_down_03.xbm) ${execi 6 $HOME/.xmonad/conky_scripts/Dropbox-down} ^fg() / \
# ---- upload speed info ---
^fg(\#AA0000) ^i($HOME/.xmonad/icons/net_up_03.xbm) ${execi 6 $HOME/.xmonad/conky_scripts/Dropbox-up} ^fg() \

Notes:

  • I changed the colors with ^fg(#COLOR_HASH)
  • to split your conky on multiple lines, I use the delimiter ‘\’

And there you go! You have a nice, clean Dropbox activity bar.