» Wiring Haskell into a FastCGI Web Server #

Paul R. Brown @ 2007-10-02

Herein part six in my hobby project to rewrite my personal publishing software in Haskell. In part five (and its addendum), I roughed-out a persistence and concurrency model for the back-end. The next two pieces are rendering content (which will be done programmatically using the Text.XHtml.Strict module; that's a separate post) and integrating with a web server via FastCGI. This post covers FastCGI integration for Lighttpd and Apache2 in the form of smoke-testing a simple FastCGI handler.

Units of Concurrency

For the concurrency model that I plan to use in the actual application, a single OS process is critically important, as multiple processes wouldn't be aware of who was doing what within the other processes. Multiple active threads within that one process are fine. Most web-based systems use a single process as a concurrency pinch point, but that process is usually the database as opposed to the web application.

Haskell in the form of GHC provides two flavors of concurrency, which I'll refer to as forkIO and forkOS (after the functions forkIO and forkOS, respectively). The forkIO flavor uses Haskell's internally managed, lightweight threads, and the forkOS flavor uses threads from the underlying operating system. (For some perspective on what an OS thread really means, take a look at my post on SMP Erlang on Mac OS.) The FastCGI binding library provides a mechanism to use forkIO, forkOS, or some other mechanism to assign a worker thread to a request, and I want to compare the two fork flavors for performance and stability.

It's worth reading the fine print in the Control.Concurrent documentation. For the present application, every thread does make foreign calls as part of handling a FastCGI request and every request is likely to complete in less than Haskell scheduler's default granularity of 20ms. I'm less interested in performance and more interested in looking for leaks, deadlocks, or other bad behavior.

Building the Right Network.FastCGI

Both the 1.0 and 3000.0.0 versions of the Network.FastCGI appear in the Hackage directory, but the darcs head version (3001.0.0 as of this posting) is the one with the multi-threaded bindings exposed. (For the uninitiated, darcs is a distributed source code management system implemented in Haskell. The darcs codebase is in literate Haskell, so it's an interesting read for that if nothing else.) Darcs is available from the usual package managers; I use MacPorts on the Mac.

Get the latest Network.FastCGI from the darcs repository:

darcs get http://darcs.haskell.org/fastcgi/

The fastcgi.cabal file documents the dependencies, but a GHC 6.6.1 install is sufficient. Then build and install it the usual Cabal way:

cd fastcgi
runghc Setup.hs configure
runghc Setup.hs build
sudo runghc Setup.hs install

One more step is needed to register the package with the compiler:

sudo runghc Setup.hs register

And then to make sure that it worked:

$ ghc-pkg list 
/opt/local/lib/ghc-6.6.1/package.conf:
    Cabal-1.1.6.2, GLUT-2.1.1, HGL-3.1.1, HUnit-1.1.1, OpenGL-2.2.1,
    QuickCheck-1.0.1, X11-1.2.1, base-2.1.1, cgi-3001.1.1,
    fastcgi-3000.0.0, fastcgi-3001.0.0, fgl-5.4.1, filepath-1.0,
    (ghc-6.6.1), haskell-src-1.0.1, haskell98-1.0, html-1.0.1,
    mtl-1.0.1, network-2.0.1, parsec-2.0, readline-1.0,
    regex-base-0.72, regex-compat-0.71, regex-posix-0.71, rts-1.0,
    stm-2.0, template-haskell-2.1, time-1.1.1, unix-2.1, xhtml-3000.0.2

It takes a bit more doing to get it going with the GHC tip (to-be 6.8) because the Data.ByteString modules have been promoted into core GHC packages and rearranged a bit, but no meaningful code changes beyond some of the import statements and the fastcgi.cabal file are required. (I've sent a patch to the maintainer.)

A Simple FastCGI Handler for Process/Thread Information

The following short Haskell program (test_IO.hs) sends back a plain text response with process and thread information:

import Control.Concurrent
import System.Posix.Process (getProcessID)

import Network.FastCGI

test :: CGI CGIResult
test = do setHeader "Content-type" "text/plain"
          pid <- liftIO getProcessID
          threadId <- liftIO myThreadId
          let tid = concat $ drop 1 $ words $ show threadId
          output $ unlines [ "Process ID: " ++ show pid,
                             "Thread ID:  " ++ tid]

main = runFastCGIConcurrent' forkIO 10 test

(This is an adaptation of the printinput.hs example that uses the multi-threaded API.) To build it:

$ ghc -threaded -package fastcgi --make -o test_IO.fcgi test_IO.hs
[1 of 1] Compiling Main             ( test_IO.hs, test_IO.o )
Linking test_IO.fcgi ...

The equivalent program (test_OS.hs) with forkOS in place of forkIO does the job for OS threads.

I can use these two FastCGI handlers with different possible compiler version, web server, and FastCGI module combinations and see how things do under some simulated loads. The only gotcha with this approach is that some HTTP benchmarking tools use response byte counts as an assertion of a correct response, and they will complain as the thread ID goes from one digit to two to three, etc. My current favorite is Jef Pozkanzer's simple http_load with a tiny tweak to show the response code if a byte count comes out off. Using a different tool, e.g., ab or httperf, produces similar results.

The Web Servers: Apache2 and Lighttpd

There are probably other alternatives that I'm overlooking, but I'm going to try the two web servers that I'm familiar with, Lighttpd 1.4.15 and Apache 2.2.4, both on Mac OS X.

Configuring Lighttpd

A Lighttpd configuration file fragment for a FastCGI handler with a single process would be:

fastcgi.server = ( ".fcgi" =>
                   ( "localhost" =>
                     (
                       "socket" => "/tmp/test.sock",
                       "bin-path" => "/path/to/test_OS.fcgi",
                       "min-procs" => 1,
                       "max-procs" => 1
                     )
                   )
                 )

See the Lighttpd FastCGI documentation for the full rundown on parameters.

Also, at least as of Lighttpd 1.4.15, which is the version that MacPorts installed for me, the following configuration change is necessary to avoid a bug:

server.event-handler = "poll"

(The default value is freebsd-kqueue; see the Lighttpd performance documentation.)

After copying the file into place, we can spin-up Lighttpd and hit the URL:

$ lighttpd -f lighttpd.conf
lighttpd -f lighttpd.conf 
$ curl http://localhost:8181/test.fcgi
Process ID: 21139
Thread ID:  4
$ curl http://localhost:8181/test.fcgi
Process ID: 21139
Thread ID:  5

The thread ID changes and the process ID doesn't, so things are good. For a bigger kick:

$ echo 'http://127.0.0.1:8181/test_OS.fcgi' > /tmp/lighttpd_OS
$ http_load -parallel 20 -fetches 1000 /tmp/lighttpd_OS 2>&1 | grep -v 8181
1000 fetches, 20 max parallel, 33908 bytes, in 0.375528 seconds
33.908 mean bytes/connection
2662.92 fetches/sec, 90294.2 bytes/sec
msecs/connect: 0.19518 mean, 1.036 max, 0.09 min
msecs/first-response: 7.26042 mean, 25.558 max, 4.31 min
996 bad byte counts
HTTP response codes:
  code 200 -- 1000

The 996 bad byte count errors are expected, since the responses for thread IDs 10 through 1005 have a different number of bytes than those for thread IDs 6,7,8, and 9. In any case, so far, so good:

$ curl http://localhost:8181/test_OS.fcgi
Process ID: 21139
Thread ID:  1006

Configuring Apache2 with mod_fastcgi

The single-process configuration file fragment for Apache2 with mod_fastcgi is:

LoadModule fastcgi_module modules/mod_fastcgi.so

FastCgiConfig -maxClassProcesses 1 -processSlack 1

<Location /fastcgi>
        SetHandler fastcgi-script
        Options ExecCGI
        allow from all
</Location>

This configuration passes the basic smoke test with no issues. Under load, the forkIO version burns about half the CPU and the same amount of memory as the forkOS version. Both versions use three OS threads most of the time, and as expected based on the comments above about the way that Haskell handles scheduling, the forkOS version never uses more than four OS threads no matter how hard the server is hit.

Configuring Apache2 with mod_fcgid

The configuration fragment for Apache2 with mod_fcgid is:

LoadModule fcgid_module modules/mod_fcgid.so

MaxProcessCount 1

<Location /fcgid>
  SetHandler fcgid-script
  Options ExecCGI
  allow from all
</Location>

With the same smoke testing approach as above (with a redirect to silence the byte count complaints):

$ echo 'http://127.0.0.1:7007/fcgid/test_OS.fcgi' > /tmp/fcgid_OS
$ curl http://127.0.0.1:7007/fcgid/test_OS.fcgi
Process ID: 16854
Thread ID:  4
$ http_load -parallel 20 -fetches 1000 /tmp/fcgid_OS 2>&1 | grep -v fcgid
1000 fetches, 20 max parallel, 34704 bytes, in 1.2484 seconds
34.704 mean bytes/connection
801.028 fetches/sec, 27798.9 bytes/sec
msecs/connect: 0.294162 mean, 2.339 max, 0.051 min
msecs/first-response: 12.8977 mean, 1009.92 max, 2.758 min
986 bad byte counts
HTTP response codes: 
  code 200 -- 998
  code 500 -- 2
$ curl http://127.0.0.1:7007/fcgid/test_OS.fcgi
Process ID: 16869
Thread ID:  7

Fail, since 16854 /= 16869, and based on the mod_fcgid's stated goals of keeping FastCGI handlers "fresh" by killing them at the first sign of an issue, not that surprising.

Aggregated Results and Additional Observations

The results in these tables were generated using http_load. For the "6000/min" test:

$ http_load -rate 100 -seconds 60 url_file 2>&1 | grep -v port

For the "60000/min" test:

$ http_load -rate 1000 -seconds 60 url_file 2>&1 | grep -v port

For the fixed rate tests, the number of nines is determined by the proportion of 200 responses out of the total number of responses (the others being 500 and 503). For the requests per second mark:

$ http_load -parallel 20 -seconds 10 url_file 2>&1 | grep -v port

First, with the current GHC version:

GHC 6.6.1, 4-core G5
ServerFastCGI SupportforkIOforkOS
Lighttpd 1.4.15built-inJust OK
6000/min - all good
60000/min - incomplete
max ~3000 req/sec
JUST OK
6000/min - all good
60000/min - incomplete
max ~2200 req/sec
Apache 2.2.4mod_fastcgiGOOD
6000/min - all good
60000/min - three 9's
max ~2700 req/sec
BEST
6000/min - all good
60000/min - four 9's
max ~2100 req/sec
Apache 2.2.4mod_fcgidFAIL
Process not stable.
FAIL
Process not stable.

None of these really cause the machine to break a sweat, with the web server doing most of the work and the FastCGI handler never consuming more than 60% of a core and a couple megabytes of resident memory. An overnight run showed the mod_fastcgi and forkOS combination to perform flawlessly under moderate load for over 108 requests.

With the latest GHC release candidate used to compile both the FastCGI package and the handlers:

GHC 6.9.20070918 (darcs tip), 4-core G5
ServerFastCGI SupportforkIOforkOS
Lighttpd 1.4.15built-inGOOD
6000/min - all good
60000/min - three 9's
~3300 req/sec
GOOD
6000/min - all good
60000/min - three 9's
~2200 req/sec
Apache 2.2.4mod_fastcgiJUST OK
6000/min - three 9's
60000/min - three 9's
~2500 req/sec
JUST OK
6000/min - three 9's
60000/min - four 9's
~1900 req/sec
Apache 2.2.4mod_fcgidFAIL
Process not stable.
FAIL
Process not stable.

Looks like GHC 6.6.1 and Apache2/mod_fastcgi is the winning combination.

Addendum

I got GHC 6.6.1 installed and configured the forkIO and forkOS handlers on the User Mode Linux server where I have this blog hosted, and it looks like forkIO is a winner there, with process stability and around 100 requests per second sustained throughput. With the forkOS variant, the process IDs do tick up with each hit, but that's a property of fork() on the kernel where one process corresponds to one thread rather than being a result of a restarted FastCGI handler.

 

← 2007-09-15 — Emacs haskell-mode Unicode Cuteness
→ 2007-10-20 — Tuppence Tour of Haskell Concurrency Constructs