I felt the need to actually write a bit of code this weekend, but wanted a distraction from actual work. I decided to try writing a C/C++ interface into S3 (with the ultimate goal of making a filesystem driver, but that's more than a weekend). This also gave me a chance to take a good look at REST.
After hacking on this for about 10 hours, I don't get it. Why is this an appealing "protocol"? (I put protocol in quotes because REST is all about not having a protocol, piggybacking entirely on top of HTTP.)
The major disasters I encountered:
1. What would have been about 10 lines of code interfacing into the Sun RPC library exploded into a mess thousands of lines of code getting libcurl (HTTP library) talking to libxml (XML parsing library).
2. Application errors and transport errors are completely indistinguishable. HTTP error codes are used for both. I spent 2+ hours debugging my "list buckets" implementation, trying to figure out why S3 was sending a 302 code back to me ("Location moved? This isn't in their list of errors..."). Breaking out ethereal and comparing its output vs. the S3 sample client finally revealed that S3's documentation is not entirely up-to-date, and the HTTP server I was talking to wasn't the actual S3 application server.
3. HTTP has a massive forest of possible states (chunked encoding, keep alive connections, continuation responses, etc.), but only a subset of these should be applicable to a given call. With REST, you need to deal with all of these if you want a robust client.
4. There are 3,527 different ways of encoding the same data, and you have to do it properly for this day of the week. For example, all requests into AWS have to be signed with your secret key; the request to be signed includes a subset of the HTTP headers and the date. In this case, the date must be in the form, "Tue, 05 Sep 2006 10:42:33 GMT". You duplicate this in the Date: HTTP header. Unless you have an X-Amz-Date header, in which case you use that. Unless you have an expiration date, in which case you use an integer representing the number of seconds since the epoch, and you put that in the HTTP request line. When S3 returns a date to you, it's another beast entirely: "2006-09-05T10:42:33".
It seems like REST came about because people didn't like any of the existing protocols (ok), but realized that designing a protocol is hard (yes) so they just threw it out the window entirely (uh oh) and just started hacking on code without any regard for the consequences (cries).
No comments:
Post a Comment