Ingy 2.ö

Tuesday, May 15, 2007

YAML and JSON







YAML was invented in April of 2001. Well it started then... It's still being invented at some levels. I happened to be one of the primary 3 guys who spent several years and countless email hours inventing it.


JSON showed up on the scene around 2005(??). JSON didn't need so much inventing because it is a subset of JavaScript. JSON and YAML have a lot in common and also distinct differences, but I can't seem to find any web page that talks much about it. So I'll do it here.

Technically YAML and JSON aren't even related. And just for the record neither are YAML and XML. Wikipedia pretty muchs gets it right in the first sentence definition for each of these:
  • XML is a ... markup language.
  • YAML is a ... data serialization format.
  • JSON is a ... data interchange format.
Except that "YAML (and begrudgingly XML) is also a useful data interchange format", there is really little crossover in definition. YAML and JSON are not markup. XML and JSON are not intended for serious serialization.

NOTE: I'm not talking about XML from here on. This post is about YAML and JSON.

But YAML and JSON have a common spirit. They both are programming language agnostic. They both are centered around hashes, arrays and scalars (or whatever you call these if you aren't a Perl guy like me). This makes them really attractive in Perl, Python, PHP, Ruby and (of course) JavaScript because these modern languages are also based on hashes, arrays and scalars as their primary data model.

One day, early on in JSON's existence, someone noticed that JSON is a pure subset of YAML. Well this wasn't completely true, but enough so that both camps adjusted their specs to make it true. The whole concept is freaky because YAML showed up years before. My only rationale is that YAML was so ambitious. We tried to make it look like everything. And we got really lucky with JSON. Anyway it's all history at this point...

So what are the differences then? In a nutshell JSON handles all the common cases for turning data into text and back into data. YAML handles all the cases so it can be a serious data serialization language. This makes JSON almost trivial to implement, and YAML almost impossible. So even though YAML was first, JSON caught on like wildfire. Since JSON is a subset, it really doesn't have anything YAML doesn't. Except a cool logo. YAML has no logo!

Here's a list of what YAML does have:

  • YAML has two ways to show collections (hashes and arrays): (Python-like) indentation and (JSON-like) braces.
  • YAML has many scalar quoting styles. Unquoted, double quoted, single quoted, literal block, and flow block. JSON uses double quotes for all strings.
  • YAML has data typing. It uses a taguri based system for explicit typing, and also supports implicit typing. JSON supports only String, Number and Boolean scalar types.
  • YAML supports multiple references to identical nodes, including circular references. JSON does not.
  • In YAML, a hash (mapping) key can be any node (including another hash, array or aliased node reference). In JSON a key is always a String. It should be noted that YAML goes beyond the capabilities of most programming languages in this regard. Only Ruby has full object key support, afaik.
  • YAML allows a "stream" to consist of multiple "documents", or top-level nodes. These can be any node of course. In JSON you can only have one top level object, and it must be a hash or an array (not just a scalar).
That's about it, but that's a ton. Especially in the details.

I think all of these things make YAML a great serialization format, and avoiding all these things make JSON a great data exchange format.

Labels: ,