Thursday, November 22, 2012

Perl non-CGI: The Missing Overview

After living in mod_php and Perl CGI for far too long, it was time to look at reworking our application to fit something else.  Although we had mod_perl installed and doing little more than wasting resources, I didn’t want to bind the app tightly to mod_perl in the way it was already tightly bound to CGI.  That meant surveying the landscape and trying to understand what modern Perl web development actually looks like.

But first, some history!


History

CGI

We start with the “Common Gateway Interface.”  The oldest, slowest, most wasteful possible method that throws away all of Apache’s performance optimization.  You know how it pre-forks a pool of servers so that the client doesn’t have to wait for one to be created before its request can be processed? CGI requires that another process be forked to handle the request, because you can’t set up the environment before then.

However, it’s about as simple as mod_php to code for.  You set your extension to be handled by mod_cgi, put up executables under the document root with that extension, and the filesystem path maps directly to visible URL, no routing needed.

FastCGI

CGI let the nascent Web connect to “anything,” but serious dynamic processing needed a more serious solution.  Enter FastCGI, commonly abbreviated FCGI.  The idea was to start up an “application server” that would process application requests, linked to the webserver by the FastCGI wire protocol.  The application server could then also be pre-forked or whatever was desired; the wire protocol itself simply defines a few messages suitable for handling multiple concurrent requests over one web-to-fcgi connection, and lets the application decide how it wants to handle those requests.

To ease porting, FCGI follows the CGI model fairly closely in terms of providing “an environment” and streams for input/output/error, but doesn’t provide a clear application framework or any sort of URL/path mapping.  Their goal was solely to split the app from the server, with as few restrictions or requirements on the app as possible.

Modern practice with FCGI structures the “user application” as a process manager (e.g.  php-fpm) which handles the web-to-fcgi connection and a pool of workers, each of which does the actual processing of a single request at a time.  The process manager doesn’t deal with URLs at all; it’s simply plumbing to get data from one end to the other.  It’s the workers which deal with understanding the actual request, doing real work, and returning the final data to the process manager.

The processing of URLs under FastCGI, then, ends up as an implicit agreement between the web server and the FCGI workers.  Dancer, for instance, provides a dispatch.fcgi that’s the target of an Apache RewriteRule, with the requested URL passed as PATH_INFO to Dancer itself.  Then, Dancer’s routing mechanism will pick up that path info for matching it with the application’s routes.  (For anyone who might have missed it, “routing” is how every framework since Rails refers to turning URLs into a decision on what application code to run.  A route links a specific url or pattern to a specific chunk of code.)  The web server generates PATH_INFO, passes it to the process manager, which passes it to a Dancer worker.

Almost.  Few frameworks for Perl or Python deal directly with FastCGI anymore.

WSGI

The main problem with using FastCGI as an app creator or framework developer was that the handling of FastCGI had to be custom-coded in each project.  For instance, in Perl, the typical FastCGI handler example has the form while (my $r = $fcgi->accept()) { process($r); } – all the control is wrapped up inside the application.

The Python folk saw that this could be inverted, so that the body of the loop would become a processing function, that could be given to anything that could build a request.  In other words, the request receiving loop (or lack of one, for any CGI holdouts) could be anywhere, as long as the request processing followed a standard interface.

The Python folk called this interface WSGI, because they defined it first, so they won the generic name.

WSGI support is practically ubiquitous these days, because server authors can support any WSGI Python framework by implementing a WSGI app runner, and Python frameworks can similarly talk to many web servers with one WSGI backend.  There are even pure-python WSGI servers which serve HTTP on one side and invoke WSGI apps on the other.

PSGI and Plack

The Perl camp eventually got around to borrowing everyone else’s good ideas, and built both their own standard Perl-side server↔framework interface (PSGI) and a set of modules implementing it for real apps (Plack).  Plack can invoke PSGI-based apps (i.e., apps built on frameworks with a PSGI backend) from FastCGI, mod_perl, and others; there are also pure-perl servers like Starman which can run PSGI apps natively.

Perl Web Frameworks

PSGI

We’ll go backwards through time in this section.  Most frameworks can be linked to PSGI, but they strongly prefer that you build your application as a set of modules that call into the framework code to link routes to actions.  Likewise, there’s a single PSGI dispatcher script that pushes all requests through one coderef, typically the framework itself.  (Which is where the route gets matched so the actual action code can be invoked.) This is not so bad for newer development, but it’s exactly the opposite of how CGI applications are structured—with no dispatch script, and many URLs invoking actions with no meta configuration.

Mojolicious::Lite supports single-file apps, but I have the distinct impression that this is not the high-performance way to handle it.  Still, it’s probably not as bad as straight CGI.

CGI::Application

This is apparently one of the big pre-PSGI frameworks (now also available in a packaging called “Titanium” which is just a meta-package depending on CGI::Application itself and some other useful pieces and plugins), designed to be runnable via CGI, while capable of being driven by FastCGI and mod_perl as well.  At least, with the usual caveats that you had to write your scripts more neatly and carefully for the persistent options.

I’m actually not entirely clear on whether a single-file CGI::Application approach is viable, or if it also strictly requires the application to be coded as modules.

ModPerl::PerlRun and ModPerl::Registry

These are the venerable mod_perl specific (naturally) handlers that provide a sort of CGI emulation for code running under mod_perl.  While not technically a framework, they bear the distinction of matching the CGI architecture much better than the “port whole application to MVC modules” approach expected by the alternatives.

For those of you who, like me, don’t find the naming intuitive, PerlRun is more CGI-like than Registry, which means Registry is faster.  It breaks more things in the name of optimization.

No comments: