Data As Code

I followed through the exercises of Seven Languages in Seven Weeks a
while back, and there was a really interesting concept introduced by
clojure (which really extends it’s idea from lisp): that code is data
and data is code. The idea that a programming language’s syntax is
flexible enough where a description of the data is actually code itself.

A good example of where this sort of binding works well is with
configuration and data files: These files are almost always authored
in a intermediary markup format that is then parsed and interpreted by
a programming language of choice. For a language where code is data,
the data or configuration file is just another file with code, and it
just has to be loaded and parsed to be understood by a program.

I didn’t consider the strength in such an idea at first. In fact, I
dismissed it as a nice-to-have, a concept whose absence in a language
was a mild detriment, but not one that would really hamper a
developers ability to do what they need to.

But the more I thought about it, the more my opinion changed. With
every new config file I faced, my opinion changed all the more.

For an example, let’s take the approach of data-as-code with a
language where the design wasn’t designed that way. Java has a lot of strengths, but
It has it’s weaknesses. Let’s say I want to express a menu. In code directly, that looks like:

FileMenu menu = new FileMenu(
  new Tab[] {
    new Tab("File",
      new Command[] {
        new Command("New"),
        new Command("Open"),
        new Command("Save"),
        new Command("Save As...")

Now in lisp(-ish):

    (tab "File" (
          (command "New")
          (command "Open")
          (command "Save")
          (command "Save as...")))))

It’s so clean! It actually looks like a config file at this point: a
data format such as json or yaml probably wouldn’t make this any more
readable. Even Python, which is a bit more dynamic that Java, doesn’t
look as clean:

menu = FileMenu(
    tab("File", (
      command("Save as...")

There’s something about those parentheses. So for these languages
where the code-data bind is not as strong, separate data is
represented as a completely separate format, either something abstract
like a map or a separate data file. However, this creates a separation
of implementation and representation. This adds any of the additional
functional overheads:

  • a deserializer, turning data -> in programming language representation
  • a validator, ensuring that the data is valid
  • a serializer, turning programming language representation -> data

When your data is already represented as code, you remove these
additional layers of abstraction: your interpreter/compiler will
figure out validity for you! Not only that, but your code
representation is just as clean, so it’s just like interacting with
your data directly. In fact, that’s precisely what you’re doing.

The insane thing is, as your code becomes more powerful, your data
does as well. Parsing the code directly allows you to add complex
configuration with ease. For example, adding a conditional statement
in the filemenu in lisp above is:

    (tab "File" (
          (command "New")
          (command "Open")
          (command "Save")
          (if (isOSX)
            (command "Save to icloud")
          (command "Save as...")))))

Now imagine how you would implement a check like that in an xml or
json file. I was able to take advantage of a new method (isOSX)
immediately, as well as built-in methods like an if statement.
Because data is code, your representation is just as powerful as your
language. Now that is power at the touch of your fingertips.

Lisp and it’s cousins are definitely weird to get used to, but the true
power of the direct binding of code and data that is built in is a
concept worth really exploring and understanding. I’m just diving in
and already having a blast.

Author: toumorokoshi

Love to code, love emacs!