Monday, July 29, 2013

SNS driven deployment (architectural overview)

3 Sep 2015: a note about how we're using php-fpm instead of suEXEC now has been published here.  Now back to your original post...

One of the problems I faced with getting Auto Scaling up and running was, "If I can't just ssh into the one live server and issue a git pull by hand, how do I push code out?"

Normal people would use capistrano, or Paste Deploy, or something, extended to access their auto scaling group to locate the hosts to use.  Or build on Heroku or Elastic Beanstalk to begin with.  I, however, am clearly not normal.


Conceptually, the goal is simple: have the webservers subscribe to an SNS topic that receives deployment instructions, and executes the deployer to carry them out.

Subscribing to SNS is handled by the bootstrapper, which clones the initial copies of deployer and listener, along with installing the apache config for the listener.  SNS will be given the (dynamic) instance hostname, so the listener itself runs on a non-80 port because the default vhost on port 80 is already used for some dynamic hostname trickery.

When a message published to SNS arrives, the listener calls the deployer twice—once to update the deployer, then again (now with potentially updated code) to update the listener.  Only then does the listener load its message-parsing code.  To be clear, the "message" here is the body of the SNS message delivered, not the SNS package itself.

The message generally reads like "deploy www" which translates into deploying our core site, in which case the deployer goes off and does that.

But wait.  I store apache configuration in git these days, because it's just as much a part of the code as the code.  If that has changed, or if the FastCGI code has changed, then I need to reload apache.  How does the deployer get permission for that?

There's even an extra trick I haven't told you about: the apache user does not have write access to the document roots, so that pwn1ng the server does not mean a php webshell can be dropped right in.  (This is a long-standing security policy that has caused no end of fights between me and pre-packaged software that thinks it should run with self-writable permissions on everything, forever.)  Because of this, the deployer can't work as the apache user to begin with.

To dance the permission dance, the listener is actually run suEXEC as the user who owns the code.  This user is also set up in sudo with no-password access to /usr/local/sbin/reconf.sh which does nothing more than test apache's configuration, and if it succeeds, reloads apache.  And to actually make this work on ec2, this user account also has Defaults !requiretty applied in sudoers.

I think that covers the pipeline.  Bootstrap creates the (non-system for suEXEC) user, installs the deployer and listener code, sets up the listener's suEXEC apache config, installs the reconf.sh script, installs a task to unsubscribe from SNS at instance shutdown, enables access to reconf.sh without a terminal allocated, and finally issues the SNS subscription message.  The listener receives and confirms the subscription.  When a deployment message arrives, the deployer and listener are updated, then the deployment message is parsed, possibly triggering sudo to reload apache.

Quite a few moving parts, but each is simple, obvious, and necessary in isolation.  Rather like any good Rube Goldberg machine.

No comments: