» Haskell, del.icio.us, and JSON #

Paul R. Brown @ 2008-01-26

I'd like to add both a sidebar with my bookmarks and some per-entry chrome for posts bookmarked on del.icio.us, but I don't want to use client-side Javascript to do it. The alternative is to pull, cache, and manage the data on the server side. As a prototype, I whipped up a simple Haskell program that uses the del.icio.us JSON APIs (for posts and for URLs), and it contained a couple of surprising detours.

Some Haskell

First up, some Haskell. After going shopping on Hackage, I installed Network.HTTP, Thomas DuBuisson's pureMD5 package, and the JSON package from Masahiro Sakai and Jun Mukai (cabalized version is here). Like all code that builds on a decent set of libraries, the Haskell code to hit del.icio.us is straightforward; full source is here, so I'll just post some fragments below to give a flavor of the code.

Create a structure to hold the data:

data DeliciousBookmark = DeliciousBookmark { bookmark_url :: String
                                           , description :: String
                                           , tags :: [String] }
                         deriving ( Show, Eq, Ord )

Build the request:

bookmarks_fragment :: String
bookmarks_fragment = "http://del.icio.us/feeds/json/"

request_for_bookmarks :: String -> Request
request_for_bookmarks user = Request ( fromJust . parseURI $
                                       bookmarks_fragment ++ user ++ "?raw" )
                             GET [] ""

Send it:

fetch_bookmarks :: String -> IO [DeliciousBookmark]
fetch_bookmarks user = do { res <- simpleHTTP . request_for_bookmarks $ user
                          ; case res of
                              Right (Response (2,0,0) _ _ body) ->
                                  return $ process_bookmarks_body body

And then parse and walk through the response body:

parse_crufty_json :: String -> J.Value
parse_crufty_json = parse_json . unescape . utf8_decode
      parse_json = \s -> case (parse J.json "" s) of
                           Left err -> error . show $ err
                           Right v -> v

process_bookmarks_body :: String -> [DeliciousBookmark]
process_bookmarks_body body =
    case parse_crufty_json body of
      J.Array a ->
          map (process_bookmark . uno) a

process_bookmark :: M.Map String J.Value -> DeliciousBookmark
process_bookmark m =
    DeliciousBookmark { bookmark_url = uns $ M.findWithDefault blank "u" m
                      , description = uns $ M.findWithDefault blank "d" m 
                      , tags = map uns $ una $ M.findWithDefault empty_array "t" m }

blank = J.String ""
empty_array = J.Array []
uno (J.Object o) = o
uns (J.String s) = s

And that's all there is to it, except that — as might be expected from the parse_crufty_json function — there were a few things that didn't work on the first pass.

Bytes and Characters

The first wrinkle I ran into with the simple del.icio.us client occurred in process_bookmarks_body. The Haskell String that comes from the HTTP response structure is just a straight conversion of the response body from bytes to character ordinals. This is all well and good if the body is encoded in ISO-8859-1, but it's fraught with peril otherwise. The del.icio.us service sends back UTF-8 (and ignores an Accept-Charset header instead either returning a correctly encoded response or a 406 response code), so any interesting characters will cause problems. In this case, what should be Solutoire.com \8250 Plotr is coming through as Solutoire.com \226\128\186 Plotr. Writing a decoder is no big deal and an opportunity to play a quick round of golf.

In terms of making HTTP in Haskell better, there was apparently a Google SoC project proposed to integrate cURL via FFI and Haskell's ByteString API, but it doesn't look like anything's come of it.

RFC-compliant JSON versus Works For Me in JavaScript

The second wrinkle with the simple del.icio.us client is more pernicious. After I resolved the string encoding issues, I started getting errors of the form:

parse error at (line 1, column 1552):
unexpected "'"
expecting "\"", "\\", "/", "b", "f", "n", "r", "t" or "u"

And sure enough, on inspection, there's an escaped apostrophe lurking in the JSON. This probably wouldn't bother a client who simply evaluated the JSON as literal JavaScript (which seems to be the intent of the API), but it's not legal JSON and the parser correctly signals an error.

The JSON grammar (per RFC 4627) permits a few escapes, and apostrophe is not among them. To wit:

         string = quotation-mark *char quotation-mark

         char = unescaped /
                escape (
                    %x22 /          ; "    quotation mark  U+0022
                    %x5C /          ; \    reverse solidus U+005C
                    %x2F /          ; /    solidus         U+002F
                    %x62 /          ; b    backspace       U+0008
                    %x66 /          ; f    form feed       U+000C
                    %x6E /          ; n    line feed       U+000A
                    %x72 /          ; r    carriage return U+000D
                    %x74 /          ; t    tab             U+0009
                    %x75 4HEXDIG )  ; uXXXX                U+XXXX

         escape = %x5C              ; \

         quotation-mark = %x22      ; "

         unescaped = %x20-21 / %x23-5B / %x5D-10FFFF

Apostrophe is U+0027.

As with the UTF-8 issues, it's a quick job to implement a filter to scan for escaped apostrophes and unescape them, but it would be nice if what is advertised as JSON was actually JSON.


← 2008-01-25 — The Blog has Ears
→ 2008-01-26 — Here Come the Spambots...