» More Haskell and Personal Publishing Platform Ramblings #
Herein, part two of
n, where I think about some basic design decisions and some actual code gets written.
GET to Function Currying
First up, some thoughts about how to map a request to a response.
A weblog is a list of entries grouped and displayed in different formats according to parameters supplied by client applications, so serving pages is limited to figuring out what format to display (single-entry HTML, multi-entry HTML, Atom) and which entries to include (include/exclude by tag, include/exclude by category, date range, maximum number of entries). Ideally, I'd like a reader to be able to, e.g., exclude only the posts I make about my kid or include only posts about BPEL and Java.
The simple design is to turn the combination of a URL and parameters into a list of functions that are applied to the list of all entries, loosely:
(format url) (filter (filters url) entries)
And that's the whole program, up to details. In mildly abused Haskell notation:
format :: URL -> [Entry] -> String filters :: URL -> [Entry -> Boolean]
In Haskell, the notation
f :: A -> B
is very much as it is in mathematics with
f being a function with domain (things of type)
A and codomain (things of type)
B. Unraveling expressions with multiple arrows is accomplished by adding parentheses from the right. That is,
f :: A -> B -> C = f :: A -> (B -> C)
is essentially a function from
C as in "
f is a function that maps (things of type)
A to a function that maps (things of type)
B to (things of type)
C". So, the function
format above maps a
URL and a list of
Entry to a String. This is one of the things that I like about Haskell — a function looks like a function.
Haskell supports currying of functions, which is only appropriate, as the language was named for Haskell Curry. The filters can be implemented by currying arguments onto functions that take two arguments. For example, consider the functions:
include_by_tag :: String -> Entry -> Boolean include_by_tag s e = s `elem` (tags e) exclude_by_tag :: String -> Entry -> Boolean exclude_by_tag s e = s `notElem` (tags e)
tags is a function that returns the list of tags applied to an entry.) In addition to showing off that Haskell lets you flip back and forth between prefix notation (
foo 1 2) and infix notation (
1 `foo` 2), the idea would be to map a query string atom like
tag=foo to the function
(include_by_tag "foo") or a query string atom like
tag=-foo to the function
Playing with the concept in
ghci is straightforward.
For example, drop the following two lines of Haskell into a text file
plus :: Integer -> Integer -> Integer plus x y = x + y
And then fire-up
$ ghci ___ ___ _ / _ \ /\ /\/ __(_) / /_\// /_/ / / | | GHC Interactive, version 6.6, for Haskell 98. / /_\\/ __ / /___| | http://www.haskell.org/ghc/ \____/\/ /_/\____/|_| Type :? for help. Loading package base ... linking ... done. Prelude> :load curryplus.hs [1 of 1] Compiling Main ( curryplus.hs, interpreted ) Ok, modules loaded: Main. *Main> :t plus plus :: Integer -> Integer -> Integer *Main> :t (plus 1) (plus 1) :: Integer -> Integer *Main> let f = (plus 1) *Main> f 2 3
:t command in
ghci interrogates the type of the expression passed to it.
Now, where does the list of entries come from?
Storing Entries — Database or Filesystem or...?
Databases are lovely places to store data where reads and writes may overlap, and the filesystem is a good place to store information that either doesn't fit or isn't needed in memory. For a weblog, read/write contention should be light (frequent reads, infrequent writes), with writes limited to posts and comments, and optimistic concurrency is entirely acceptable. (It's of no consequence if someone gets slightly stale content.) However, the total amount of content in my weblog, counting from 2002, is in the hundreds of kilobytes, so there is no reason not to hold the whole thing in memory.
Haskell (and specifically GHC) has a couple of shiny objects that I'm tempted by. The shiniest one is STM or Software Transactional Memory, and like the three-line quicksort implementation is one of the teasers for Haskell, the four-line
AtomicInteger.getAndIncrement() implementation is the teaser for STM:
get_and_increment :: TVar Integer -> IO Integer get_and_increment i = atomically ( do j <- readTVar i writeTVar i (j+1) return j )
The whitespace in the above definition is critically important, as it tells Haskell that the lines are all part of the
do. I'll come back to the left arrow ("
do notation below.
Experimenting with this in
ghci isn't much more complicated than the
plus example above. Put the four lines above in a text file called
get_and_inc.hs following the import statement:
And fire up
ghci with an extra directive to get it to load the STM package:
$ ghci -package stm ___ ___ _ / _ \ /\ /\/ __(_) / /_\// /_/ / / | | GHC Interactive, version 6.6, for Haskell 98. / /_\\/ __ / /___| | http://www.haskell.org/ghc/ \____/\/ /_/\____/|_| Type :? for help. Loading package base ... linking ... done. Loading package stm-2.0 ... linking ... done. Prelude> :load get_and_inc.hs [1 of 1] Compiling Main ( get_and_inc.hs, interpreted ) Ok, modules loaded: Main. *Main> x <- atomically ( newTVar 1 ) *Main> :t x x :: TVar Integer *Main> get_and_increment x 1 *Main> get_and_increment x 2 *Main> get_and_increment x 3
The original paper on STM is from Microsoft Research back in 2005. The follow-on paper Lock Free Data Structures using STM in Haskell is a good read, wherein the authors construct two implementations of ArrayBlockingQueue in Haskell, one using locks and one using STM, and then compare their performance. (Spoiler: The larger the number of processors, the better STM performs.)
As promised, a quick word about
do. Haskell is a purely functional language, and that
means that it is side-effect-free. On the one hand it's great to have
transparency, since it lets the compiler or runtime VM do things
like replace common pieces of a complicated expression with values,
but on the other hand, something imperatively trivial like console
output is a side-effect. The elegant workaround is to allow
a function to return an action as a value, where the action is to be
performed by an external observer at some point. The construct is
called a monad,
do are some of the notation
that comes with it. (Defining the nature of the action and the
observer is the art of constructing a monad.) The
do" chains up actions, and the
<-" captures a value created by an action. Yet
good paper from Microsoft Research has a thorough tutorial on
monads in Haskell.
This brings me to the design for storing and accessing the entries:
- Use a single multi-threaded FastCGI handler to serve all requests.
- Maintain an in-memory copy of all content (entries and approved comments), with access and updates managed via STM.
- Store entry and approved comment content as separate files on disk, to be loaded at start-up time and written when an entry is posted or comment approved.
Data structures and a lo-fi approach to outputting Atom are up next.