Ingy 2.ö

Tuesday, May 15, 2007

YAML and JSON







YAML was invented in April of 2001. Well it started then... It's still being invented at some levels. I happened to be one of the primary 3 guys who spent several years and countless email hours inventing it.


JSON showed up on the scene around 2005(??). JSON didn't need so much inventing because it is a subset of JavaScript. JSON and YAML have a lot in common and also distinct differences, but I can't seem to find any web page that talks much about it. So I'll do it here.

Technically YAML and JSON aren't even related. And just for the record neither are YAML and XML. Wikipedia pretty muchs gets it right in the first sentence definition for each of these:
  • XML is a ... markup language.
  • YAML is a ... data serialization format.
  • JSON is a ... data interchange format.
Except that "YAML (and begrudgingly XML) is also a useful data interchange format", there is really little crossover in definition. YAML and JSON are not markup. XML and JSON are not intended for serious serialization.

NOTE: I'm not talking about XML from here on. This post is about YAML and JSON.

But YAML and JSON have a common spirit. They both are programming language agnostic. They both are centered around hashes, arrays and scalars (or whatever you call these if you aren't a Perl guy like me). This makes them really attractive in Perl, Python, PHP, Ruby and (of course) JavaScript because these modern languages are also based on hashes, arrays and scalars as their primary data model.

One day, early on in JSON's existence, someone noticed that JSON is a pure subset of YAML. Well this wasn't completely true, but enough so that both camps adjusted their specs to make it true. The whole concept is freaky because YAML showed up years before. My only rationale is that YAML was so ambitious. We tried to make it look like everything. And we got really lucky with JSON. Anyway it's all history at this point...

So what are the differences then? In a nutshell JSON handles all the common cases for turning data into text and back into data. YAML handles all the cases so it can be a serious data serialization language. This makes JSON almost trivial to implement, and YAML almost impossible. So even though YAML was first, JSON caught on like wildfire. Since JSON is a subset, it really doesn't have anything YAML doesn't. Except a cool logo. YAML has no logo!

Here's a list of what YAML does have:

  • YAML has two ways to show collections (hashes and arrays): (Python-like) indentation and (JSON-like) braces.
  • YAML has many scalar quoting styles. Unquoted, double quoted, single quoted, literal block, and flow block. JSON uses double quotes for all strings.
  • YAML has data typing. It uses a taguri based system for explicit typing, and also supports implicit typing. JSON supports only String, Number and Boolean scalar types.
  • YAML supports multiple references to identical nodes, including circular references. JSON does not.
  • In YAML, a hash (mapping) key can be any node (including another hash, array or aliased node reference). In JSON a key is always a String. It should be noted that YAML goes beyond the capabilities of most programming languages in this regard. Only Ruby has full object key support, afaik.
  • YAML allows a "stream" to consist of multiple "documents", or top-level nodes. These can be any node of course. In JSON you can only have one top level object, and it must be a hash or an array (not just a scalar).
That's about it, but that's a ton. Especially in the details.

I think all of these things make YAML a great serialization format, and avoiding all these things make JSON a great data exchange format.

Labels: ,

4 Comments:

  • Not quite sure what you mean by "full object key support" WRT dict/hash keys. Python allows for anything that can be hashed uniquely to be used as a key in a dictionary, including objects.

    By Blogger urbanape, At May 16, 2007 6:50 AM  

  • Nice article, but I still don't quite understand the difference between data serialization and data interchange. Is it simply a difference in performance and complexity?

    By Blogger Keith A., At May 18, 2007 6:18 AM  

  • Urbanape, my understanding was that Python only allows "immutable" objects as keys. So tuples, yes. Lists and dicts, no. Am I correct? If so, *that* is what I meant by *full* object support.

    Keith, it is true that JSON serializes data. It just doesn't have the capability to serialize *any* data structure. If you serialize, say, an object of some given class, it stands to reason that the class's *type* needs to be part of the serilization. JSON only does plain hashes, arrays, strings, numbers and boolean. Not FooBar objects. You could make your own serialization scheme using plain JSON (or XML). It's just that in YAML, types and references are first class citizens.

    By Blogger ingydotnet, At May 18, 2007 1:53 PM  

  • ingydotnet, correct:

    foo=dict()
    # a list as a key fails:
    foo[['key',]] = 'val'
    Traceback (most recent call last):
    File "stdin", line 1, in ?
    TypeError:list objects are unhashable
    # a tuple as a key succeeds
    foo[('key',)] = 'val'
    str(foo)
    "{('key',): 'val'}"

    thank god for the immutability of strings.

    By Blogger velotron, At June 10, 2007 1:17 PM  

Post a Comment



<< Home