Date Tags python

If you've worked with tornado, you may have encountered a traceback of a somewhat bewildering error:

Traceback (most recent call last):
    File "/usr/local/lib/python2.7/site-packages/tornado/ioloop.py", line 832, in start
fd_obj, handler_func = self._handlers[fd]
KeyError: 16

A few other people have been confused as well. After some digging and a combination of learning about the event loop, fork, and epoll, the answer finally entered into focus.

TLDR

If you're looking for the solution, don't call or start IOLoops before an os.fork. This happens in web servers like gunicorn, as well as tornado.multiprocess, so be aware of that caveat as well.

But why does this happen?

As I mentioned previously, this is a combination of behaviour all across the system, python and tornado stack. Let's start with learning more about that error specifically.

The code the traceback is referring occurs in the the IOLoop:

# tornado/ioloop.py
self._events.update(event_pairs)
while self._events:
    fd, events = self._events.popitem()
    try:
        fd_obj, handler_func = self._handlers[fd]
        handler_func(fd_obj, events)

What are these variables? you can read the IOLoop code yourself, but effectively:

  • _handlers is a list of the callbacks that should be called once an async event is complete.
  • _events is a list of events that have occurred, that need to be handled.

What is an FD?

The handlers and events are both keyed off of file descriptors. In a few words, file descriptors represent a handle to some open file. In unix, a pattern has propagated where a lot of resources (devices, cgroups, active/inactive state) are referenced via file descriptors: it became a lingua franca for low level resources because a lot of tooling knows how to work with file descriptors, and writing and reading to a file is simple.

They're useful for tornado because sockets also have a file descriptor represent them. So the tornado ioloop could wait for an event affecting a socket, then pass that socket to a handler when a socket event is fired (e.g. some new data came into the socket buffer).

What modifies the events and handlers?

A KeyError handlers means there's a key in events that is not in the handlers: some code is causing events to be added to the ioloop, and aren't registering a handler for it at the same time. So how does that happen in the code?

A good starting point is looking where _handlers and _events are modified in the code. In all of the tornado code, there's only a couple places:

# tornado/ioloop.py
def add_handler(self, fd, handler, events):
    fd, obj = self.split_fd(fd)
    self._handlers[fd] = (obj, stack_context.wrap(handler))
    self._impl.register(fd, events | self.ERROR)
# tornado/ioloop.py
def remove_handler(self, fd):
    fd, obj = self.split_fd(fd)
    self._handlers.pop(fd, None)
    self._events.pop(fd, None)
    try:
        self._impl.unregister(fd)
    except Exception:
        gen_log.debug("Error deleting fd from IOLoop", exc_info=True)

Looking at these pieces, the code is pretty solid:

  • handlers are added only in add_handler, and they are added to a _impl.register
  • handlers are only removed in remove_handler, where they are removed in _events, _handlers and _impl.
  • events are added to _events in _impl.poll()

So the removing of handlers always make sure that events no longer has it anymore, and it removes it from this impl thing too.

But what is impl? Could impl be adding fd's for events that don't have handlers?

impl: polling objects

It turns out _impl is chosen based on the OS. There is a little bit of indirection here, but the IOLoop class in tornado extends a configurable object, which selects the class based on the method configurable_default:

# tornado/ioloop.py
@classmethod
def configurable_default(cls):
    if hasattr(select, "epoll"):
        from tornado.platform.epoll import EPollIOLoop
        return EPollIOLoop
    if hasattr(select, "kqueue"):
        # Python 2.6+ on BSD or Mac
        from tornado.platform.kqueue import KQueueIOLoop
        return KQueueIOLoop
    from tornado.platform.select import SelectIOLoop
    return SelectIOLoop

And each of these loop implementations pass it's own argument into the impl argument:

class EPollIOLoop(PollIOLoop):
    def initialize(self, **kwargs):
        super(EPollIOLoop, self).initialize(impl=select.epoll(), **kwargs)

Looking at select.epoll, it follows the interface of a polling object: a class in the Python standard library that has the ability to poll for changes to file descriptors. If something happens to a file descriptor (e.g. a socket recieving data), the polling object, it will return back the file descriptor that was triggered.

Different architectures have different polling objects implemented. The avaialable ones in tornado by default are:

  • epoll (Linux)
  • kqueue (OSX / BSD)
  • select Windows use

In our case, this was happening on Linux, so we'll look at epoll.

epoll

So what is epoll? It's documented in the Python standard library, but it's a wrapper around the epoll Linux system calls.

The ioloop code actually looks like:

  • wait for epoll to return a file descriptor that has an event
  • execute the handler (which will presumably register another handler if another step is required, or not if it's complete)
  • repeat.

epoll has two different configurations, but the one tornado uses is edge-polling: it only triggers when a CHANGE occurs, vs when a specific level is hit. In other words, it will only trigger when new data is available: if the user decides to do nothing with the data, epoll will not trigger again.

epoll works by registering file descriptors for the epoll object to listen to. You can also stop listening to file descriptors as well.

So epoll works great for an event loop. But is it possible to somehow register file descriptors to the epoll/impl object without using the method above?

epoll and os.fork

It isn't possible to register things outside of the impl object. But, os.fork can cause some weird behaviour here. See, the way that one interfaces with epoll is using file descriptors: you have an fd to the epoll object, and you can use Linux system calls to work with that:

As mentioned previously, file descriptors is a common way to reference some object when using Linux kernel system calls.

Another common system call is fork. The documentation of fork specifies that fork is equivalent to:

  • copying the memory of the current process to a new space
  • spawning a new process that uses the new copy.

This is fine for most objects in memory, but how about file descriptors, which reference some object outside of the memory space of the current process.

In the case of file descriptors, the file descriptor is also cloned to the new fork. In other words, both the parent and the child process will have a reference to the same file descriptor.

So, what does this mean for epoll, which is just another file descriptor under the hood? Well, you can probably guess.

It gets shared.

How the bug works

So this is the crux of the issue. When an os.fork occurs, the parent and the child share the SAME epoll. So for an IOLoop that is created by the parent object, the child process uses the same epoll as well!

So, that allows a condition like this:

  1. parent creates an IOLoop loop_1, with an epoll epoll_1
  2. parent calls os.fork, creating loop_2, which shares the same epoll_2
  3. parent starts ioloop, waits for epoll_1.poll()
  4. child adds a handler for fd_2 to epoll_1
  5. parent gets back fd_2, but doesn't have a handler for it, and raises the KeyError.

So this will pretty much happen at some point anytime a new ioloop is not created for a child process.

Here's a repro script. I couldn't figure out a good way to kill this gracefully, so be warned this will need to be killed externally.

import logging
import select
import socket
import os
import time
import tornado.ioloop
import tornado.httpclient
import tornado.web

serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serversocket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
serversocket.bind(('127.0.0.1', 8080))
serversocket.listen(1)

logging.basicConfig()

loop = tornado.ioloop.IOLoop.current()

if os.fork():
    handler = lambda *args, **kwargs: None
    loop.add_handler(serversocket.fileno(), handler, select.EPOLLIN)
    time.sleep(0.1)
    client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    client.connect(('127.0.0.1', 8080))
    client.send(b"foo")
else:
    loop.start()

How about gunicorn or tornado.multiprocess?

So how to avoid this in gunicorn or tornado.multiprocess, which uses an os.fork? The best practice is to not start the ioloop until AFTER the fork: calling ioloop.Instance() or current() will create an ioloop whose ioloop will be shared by any child ioloop, without explicitly clearing it.

Gunicorn calls a fork as it's spawning a worker:

# gunicorn/arbiter.py
def spawn_worker(self):
    self.worker_age += 1
    worker = self.worker_class(self.worker_age, self.pid, self.LISTENERS,
                               self.app, self.timeout / 2.0,
                               self.cfg, self.log)
    self.cfg.pre_fork(self, worker)
    pid = os.fork()
    if pid != 0:
        self.WORKERS[pid] = worker
        return pid

Summary

Tornado is an awesome framework, but it's not simple. However, thanks to well documented pieces, it's possible to diagnose even complex issues like this, and do a bit of learning along the way.

Also, os.fork is not a complete guarantee that you'll get a unique instance of every object you use. Beware file descriptors.


Comments

comments powered by Disqus

About Yusuke Tsutsumi
Software Engineer at Zillow. I focus on tools and services for developer productivity, including build and testing.

My other interests include programming language design, game development, and learning languages (the non-programming ones).