July 18th, 2010

On FaceBook's Thrift semantics, code generation, and OCaml

Note: The obligatory TL;DR section is at the very end of this text.

The ASN.1 rant

After co-founding Echo, I had to put the asn1c's development on hold, for the sheer lack of time. (If you don't know what my asn1c is, think of it as the most evolved open source ASN.1 compiler.) Despite suspending development, I've been tracking the ASN.1 evolution, as well as the emergence of some newer technologies competing with what ASN.1 has to offer. I am referring to FaceBook's Thrift, Google's Protocol Buffers, Cisco's Etch, and the likes. Yet to this day I had no opportunity to actually use any of those in production.

I have my own personal score with the ASN.1 world. The standard is laden with design-by-committee complexity, no doubt evolved to “address the real world demands”. Its mind-numbing standardese and semantics effectively prohibit any newcomer from ever entering this field and producing a decent new compiler. So, we're stuck with something like 2.5 alive ASN.1 compilers covering some 3 mainstream languages (C++, C#, Java). The commercial products are often cost prohibitive: they're squarely aimed at rich telecom market. Where would a small Ruby or Python startup go? (While I am at it, Erlang has a free decent compiler, you know).

Yet, there's an opposite side to this complexity. Many things you struggle with or “invent” for the purpose of better data serialization have already been invented in the ASN.1 world. Things like broiled-to-perfection TLV-based encodings (BER/DER/CER), bitwise Packed Encoding Rules (competing with gzip'ing your serialized binary blob), Information Object Classes (think of SNMP MIB macros on steroids), or Encoding Control Notation have a lot to offer and learn from.

But I should stop kicking that dead horse. Let's try Thrift for a change.

Collapse )

You can download the patch here: http://lionet.info/patches/thrift-trunk-962854.patch

4. Obligatory TL;DR section

Thrift specification underspecifies several important aspects of the description language semantics. The Thrift target language code generators are inconsistent in the way they treat certain parts of the specification. I made an attempt to make the OCaml generator produce a bit safer and compliant code, and am sharing a patch with you.
https://issues.apache.org/jira/browse/THRIFT-827
https://issues.apache.org/jira/browse/THRIFT-860

P.S. The above patches have since been merged into Thrift.