Friday, April 22, 2016

Both REST and JSON suck - really!!

Alan Kay once said,
The Internet was done so well that most people think of it as a natural resource like the Pacific Ocean, rather than something that was man-made. When was the last time a technology with a scale like that was so error-free? The Web, in comparison, is a joke. The Web was done by amateurs.
Programmers experience the resulting pain of the poor implementation of the Web on a daily basis. Two glaring examples are REST and JSON.

You might wonder, Wait - JSON was created as a better alternative to SOAP, so isn't it really better? And REST was created as a better alternative to WSDL - so isn't that better also?

Well, better, yes, but that's a pretty low bar. Let's not dredge back up WSDL and SOAP - let's please leave those in the trash can of horrors where they belong. REST and JSON are sufficiently terrible that we don't need to go back to things that were even more terrible.

So why is REST terrible? It all started with the notion that inter-system messages need to be human readable. Really, I think it started with firewalls: in the late '90s programmers wanted to make remote requests across firewalls, and existing protocols had trouble doing that, so programmers turned to HTTP, which was designed for human readable content. Programmers wanted to send inter-system messages, so they grabbed XML, which was a nice data format that could easily be pumped over HTTP. Then OASIS and W3C got into the mix and soon we had WSDL and a raft of other standards - all of which repeat the mistakes of HTTP, namely the lack of type safety, and the lack of scoping of a standard so that you don't have to figure out what you don't know that you need to know - e.g., which HTTP headers are appropriate for the data you are sending - given that header types are defined in an ever growing list of ever updated RFCs, and there is no "header compiler" - no way to validate your headers or content body format without actually running the code.

HTTP is, frankly, a mess.

REST tried to simply the horrors of WSDL by defining a simpler approach. After all, all we are trying to do is send a friggin' message. REST says, Just put the message in an HTTP payload - forget all the WSDL definition. The client will parse the payload and know what to do.

The problem is, clients now have to parse the message. Message parsing is something that should be done behind the scenes - it should be automatic. Client and server endpoint programs should be able to work with an API that enables them to send a data structure to another machine, or receive a data structure - in the language in which they are working. Application programmers should not have to parse messages.

Languages like Go make JSON and XML parsing easier because parsing support is built into the language, but it is still a-lot of work - and a-lot of code. E.g, in Go, a JSON stream will be parsed into a data structure - but it is not the data structure you want: it is a hashtable of "interface{}" types. You have to programmatically convert the hashtable into your desired strongly typed object. It is all quite klunky.

JSON was created as a better alternative to XML, which is very hard to read. However, JSON suffers from the fact that it is still a message syntax - that is, one writes an actual message in JSON, rather than defining a message schema. Thus, there is no compiler - and therefore no way to check a JSON message until you actually run your code and send a message. Actually, that is not entirely true now - someone has realized this problem and invented a JSON schema tool. But then if one has defined a schema, why code JSON messages by hand? - why not generate the code that does the message marshaling and unmarshaling?

Ironically, Google - the creator of Go - has come up with Protocol Buffers as an alternative to REST and JSON. And guess what? - messages are not human readable, and the programmer only defines the message schema - all the parsing code is automatically generated. Hmmm - that's what CORBA did. Why did Google do this? Answer: it turns out that message processing efficiency matters when you scale: imagine that REST/JSON messages require X CPU cycles to marshal and unmarshal, and Y amount of bandwidth, and that the same application using protocol buffers requires X/100 CPU cycles and Y/100 bandwidth - if X and Y are Internet-scale, that translates to real dollars, like needing ten machines instead of 1000 machines. Google has switched to Go for the same reason: natively compiled code runs faster than scripted code - a-lot faster - and that translates to less compute resources.

So we are back to the future. We have come full circle. What a circuitous detour. So much wasted effort.