Wednesday, December 28, 2016

API Gateway vs. gzip: ERR_CONTENT_DECODING_FAILED

tl;dr: API Gateway may treat binary as text and mangle the content, such as when our backend server returned Content-Type: text/html; charset=UTF-8 with Content-Encoding: gzip responses.  My workaround for Apache 2.4: SetEnvIf x-amzn-apigateway-api-id .+ no-gzip.  (If the request contains the header with any value, don’t gzip the response.)

Sunday, December 4, 2016

My Path to Programming

Inspired by Dan Luu’s post about the same, in turn based on Tavish’s.

For me, it really started with a sibling rivalry.  I wanted to try everything my older brother did, and secretly, I wanted to be better at it than he was. We had wars about who could write better BASIC programs.  When my dad bought AssemPro for our Amiga, my brother decided to learn it: therefore, so did I.


Monday, September 12, 2016

Update: rumprun and php7

Months ago, I wrote about not being able to get libmemcached to link into a static PHP build.

I don’t take “no” for an answer from code.

I have not only gotten it to link (with a carefully targeted replacement of the linker in the build scripts), I have patched up opcache to both configure correctly while cross-compiling, and to build statically.

I seriously don’t take “no” for an answer from code.

That’s where I stand.  I’ve been focusing on getting some music done, notably this FF8 remix, so rumprun is on the back burner.  Still, the php7-full branch is what I would consider “usable” now.

If you’re reading this in the future, you should check the wiki for the status.

Friday, September 9, 2016

AWS API Gateway: Returning 404 Errors with OAuth 2.0

As discussed before, we’ve been building out some services using AWS API Gateway.

We have an OAuth 2.0 infrastructure that predates API Gateway, and we’ve had a lot of problems with third parties being able to use the APIs behind API Gateway.  Almost any mistake that can be made with the Authorization header set leads to an unhelpful message from Amazon CloudFront (which technically underpins API Gateway): “not a valid key=value pair” pointing to the access token in the Authorization header.

As it turns out, one of these error cases is a response that should generate a 404 Not Found response, because the URL doesn’t exist in API Gateway.

There’s a workaround to fake 404 messages: build fake endpoints into the API definition.


Saturday, July 9, 2016

A hat beside the “PHP7 vs HHVM” ring

I spent all my free time, all week, exploring the difference in performance between PHP 5.6.22, PHP 7.0.8, and HHVM 3.14.1.  (HHVM 3.14.2 came out later in the week, but I didn’t want to redo everything.)

In the end, it turned out kind of useless as a “real world” test.  The server doesn’t have L3 cache, and the client is a virtual machine.  I also didn’t run a large number of trials, nor tune the server stacks for optimum performance. In fact, the tunings I tried had no effect, at best.

tl;dr conclusions:

  • HHVM proxygen is almost completely amazing.  If you have the option, it is probably what you want. It just crashes hard when concurrency exceeds its open file limit.
  • nginx+FastCGI hits a resource limit somewhere, and starts failing requests at higher concurrency, like 160.
  • apache+FastCGI does the same… but the limit is higher, between 256 and 384. The price for this is that it’s only 86% as many requests per second.
  • Providing more FastCGI workers makes the errors hit sooner, but ramp up more slowly.
  • I’m really disappointed in nginx.  I expected more.  A lot more.

Saturday, July 2, 2016

Nonlocality

Three years ago, I was porting a Perl CGI-based system to FastCGI, one URL at a time, using mod_rewrite to do the dispatching.  (If the handler.pm exists where the FCGI dispatcher will look for it, invoke this request via FastCGI.)  A consequence of this is that the core library code needs to run on both paradigms, since both front-ends use it.  That runs straight into problems when the CGI access checking functions blithely print and call exit when the access is denied.

Instead of updating 60+ scripts to read e.g. $session = get_session() or Site::CGI->go_login; I decided to get hackier.


Wednesday, June 29, 2016

rumprun php7 experiment results

I forked rumprun-packages to build its php7 package with more features by default.  The original says it’s an “unmodified PHP 7.0.3” build, but it uses --disable-all to build with a minimal number of extensions by default, and that’s not really great for showing just what could work on rumprun.

I made a “mega” build which contains everything I can possibly include into it, including packaging up libjpeg-turbo and libpng so that this PHP can use gd.

I have been unable to get the memcached extension to work, though.  libmemcached-1.0 is written in C++ and depends on some exception-handling symbols.  PHP links with $(CC), which doesn’t include any C++ support.

I’d probably be more disappointed in this, except that I don’t know if rumprun PHP is all that useful.  Options are the single-threaded CLI HTTP server (not intended for production), or using ye olde php-cgi SAPI… which is also not a multithreaded option.  It expects to run single-threaded child processes under a FastCGI process manager.  (php-fpm simply integrates a process manager, so that you don’t need to find a separate one and integrate it; it’s also multi-process.)

And rumprun doesn’t have processes.  It’s a unikernel.

Tuesday, June 28, 2016

Scaling

How it feels when we try to procure stuff. Maybe we’re just a personal sized business?



This is only getting worse, as each AWS hardware generation raises the machine size floor.

Friday, June 17, 2016

My Blogger Pipeline

For the past couple of years, I’ve been uncomfortable with the notion of Google watching my every thought as I type into a draft on the Blogger site. (Since a post whose working title was “VPNs are hard,” apparently.)

As a coder, everything can be solved with the code-hammer, so I wrote 7 KB of PHP. Now, I can write locally in Markdown, then run markup-menu.php.  This is a stupidly simple program that takes a unique filename prefix, finds the matching *.md file, and passes it to the much larger markup.php.

That program takes a file, preprocesses the Markdown a bit for compatibility with Blogger’s “preserve line breaks” setting, and post-processes the resulting HTML to iron out a few more quirks.  That file can be copypasta’d into Blogger.

Once there, it gets Blogger dressing like tags, then I click Preview.  If nothing’s wrong, it gets posted, as if it sprang fully formed from my head into Blogger.

Tuesday, June 14, 2016

Modern AJAX in jQuery

In the dark ages, I wrote my own wrappers for jQuery.getJSON because it had the function signature:

$.getJSON(url [, data] [, success])

And I wanted to provide an error handler.  So, our corporate library (represented by LIB) has a function like:

LIB.getJSON(url, data, success [, error])

I also didn’t know if I could rely on getting a response body from error events, so where possible, errors are returned as 200 OK {Error:true,ErrorMsg:"boom"} and the LIB.getJSON catches these and invokes the error callback instead of the success one.

(One more sub-optimal design choice: like, almost the whole point of LIB.getJSON is to pass an error handler and let the user know “okay. we are not loading anymore.”  But notice that the error handler is still considered optional for some reason.)

If I were designing this from scratch today, the service would return errors as HTTP error responses, and I’d use the “new” (added in 1.5, after I started jQuery’ing) Deferred methods.

function success (data, txt, xhr) {
    // ...
}
function error (xhr, txt, err) {
    var r;
    try {
        r = $.parseJSON(xhr.responseText);
    } catch (e) {}
    // ...
}
$.getJSON(url, data)
    .then(success, error);

Result: a more RESTful design, with less glue.

I’d probably still have an “error adapter generator” function which converts all the possibilities for xhr/txt/err down to a single message string, and pass newErrorHandler(showErrorUI) as the error callback into .then().  But the point is, there’s no need to have LIB.getJSON as a whole anymore, to both accept an error callback and filter ‘successfully returned error’ values.

Sunday, May 29, 2016

Ten Years of Inheritance (in professional coding)

I wrote in another draft:

And fundamentally, inheritance is the wrong tool for reuse.  Providing lots of small methods to make for improved reuse through inheritance is an equally wrong approach.

I wanted to write about that a little more.

Cases where inheritance has worked out well: when I can arrange for a base class to provide a lot of functionality, and one or maybe two abstract methods overridden in the subclass.  Essentially, this comes down to “using inheritance as a mechanism to implement a strategy for some small, but critical, polymorphic bit.”  Or perhaps, “making a DSL by providing operations in a base class, then implementations in child classes.”

Probably the most accessible example of this is PHP’s FilterIterator which does little more than wrap a regular Iterator with an accept() method.

Notably, it doesn’t extend anything itself to do this.  It composes another iterator and exposes that through the OuterIterator interface.

Sunday, May 22, 2016

Perl Retrospective

Perl was the first language I really did serious work in.  To this day, even though it’s dying, I still like it.  It’s different, which is at once both its major virtue, and its Achilles heel in the modern programming landscape.

Let’s talk about what makes Perl Perlish, startng with the good parts.

(Note that I use a lot of 5.10 features like say and maybe the // operator.  For the most part, assume a use v5.10; line before any code. I may not adhere to use strict; use warnings; in this post, but my production code does.)

Monday, May 16, 2016

My Coding Style

When I write code, I think in “framework” flavor.  How can I make this code modular and extensible?  How can I let other people use it without having to reimplement 80% of some method when they want to change anything about how it works, assuming they couldn’t just change this code?

I mostly build inside of classes and modules for the information hiding aspect of it.  I use Douglas Crockford’s module pattern in my JavaScript to minimize global namespace pollution.  Likewise, nearly all of my variables in Perl are lexical.

I’m also likely to pack small amounts of state and behavior into a tiny class, no matter how trivial it seems on the surface.  I’m the guy who writes a Csv class so that I can call $csv->put($data) and have all the other options (separator, etc.) baked into the class.  You could create a Tsv class with just a few lines to override the separator.  (You probably won’t ever need to. But you could, and that’s what makes me happy.)

While I get functional programming and stream programming, I generally consider it sub-optimal and avoid it where possible for PHP.

Why do I do it this way? What’s in my head when I’m programming?

Wednesday, March 23, 2016

Geolocation Fail

I was on Tor the other day, when I tried to access a Blogger blog. I got an infinite chain of redirections, because Google geolocates the IP and then issues a redirect to that country's blogger site.

For instance, with an IP geolocated to the Netherlands, *.blogspot.com and *.blogspot.de will redirect to *.blogspot.nl.  Tor Browser sees that as a new site and runs a new circuit with a different exit node, likely in a different country, causing another redirection.

Generally, the domain and exit node mapping remains fixed.  So, blogspot.com might redirect to blogspot.de, which would redirect to blogspot.nl, which might redirect to blogspot.com.  But the later accesses retain the original exit nodes, and all cause the redirections again.  Most sites work because they don't try to change domains per country.

A few attempts at "New Tor circuit for this site" finally broke the loops by changing the exit node for that single domain, but it's clear Google still puts too much faith in geolocation.

Any blogs that end up being accessed from EU exit nodes also get the cookie warning… generally in a language I can't read, because it's chosen by geolocation.  Even though my headers have "Accept-Language: en-US;q=0.5" and I'm visiting a page whose primary language is English. IK SNAP HET!

(Speaking of Tor, though. If you want to know whether a site uses CloudFlare, just load it in Tor Browser. You'll generally see a CloudFlare CAPTCHA on their sites. They're so aggressive about putting out high-difficulty puzzles, I generally don't bother to solve anymore.)

Tuesday, February 23, 2016

PGP

For the first time in over fifteen years of awareness about PGP, I met someone who actually wanted to use it.  I got to set trust on a key and see this awesome menu:
Please decide how far you trust this user to correctly verify other users' keys (by looking at passports, checking fingerprints from different sources, etc.)

1 = I don't know or won't say
2 = I do NOT trust
3 = I trust marginally
4 = I trust fully
5 = I trust ultimately

This reveals a lot about the assumptions of PGP and the problem it was trying to solve...

The menu clearly focuses on real-world identities, trying to get users to establish ‘trust’ that people correspond to the cyber-space identities. (Those digital identities are the keys: anyone with the key is indistinguishable from anyone else who has it.) Why else is the focus on “verifying” by looking at passports and fingerprints?

In short, PGP was the first Google+: built by nerds as an identity service for the masses… that failed to become mainstream.

Thursday, February 18, 2016

Unikernels

I learned a new term recently: unikernel.  Wikipedia defines it as a “single address space machine image” or else uses the “library operating system” term, but the gist of it is:

Unikernel: A system where a single application runs in unprotected mode.

Instead of a “normal” operating system managing multiple processes, a unikernel has the OS services arranged as a library, and runs one process with that library in the kernel space of the host hardware.

This used to be fairly impractical, as you’d have to dedicate physical hardware to the unikernel, and it would still need drivers for that hardware. It was a lot of work and not very portable.  But with virtualization, a unikernel can build to the hypervisors’ interfaces, then run in any environment that supports the hypervisor.

Here’s the difference from other approaches once virtualization gets in the mix: a unikernel uses the hypervisor only to provide isolation and virtio devices, then runs one process with no further hardware protections inside the guest.  Networking, filesystems, and anything else normally provided by the OS through the syscall interface are, instead, built as function calls living in the same address space with the main application.

A container uses a host OS kernel in much the same way, to run each container as a host process.  While this requires a specific syscall interface, it also means the container environment can take advantage of the host OS’s drivers, filesystems, and networking.  The container and hypervisor are fully unaware of each other, which also enables them to run on a host OS that’s running on bare metal.

A traditional guest system, of course, runs a full OS inside the guest and ordinary processes inside the OS.  Like the unikernel, it boots anywhere with the requisite CPU/virtio support, regardless of underlying host OS, but like a container, it continues to provide OS services like networking, filesystems, and process management through the syscall interface.

(Plenty of other ink has been spilled about the security of all this, but the tl;dr is, hypervisors more-or-less secure multi-tenant hardware.  Running in the cloud means being virtualized, even with containers.)

Here’s a rough drawing of two virtual-machine architectures (KVM and Xen), and how they differ from containers and unikernels:



Obviously, unikernels are also a bigger change in application architecture from containers (a single package and its dependencies, minus a kernel) or a full OS, but they’re well-suited for single-language microservices.  Once a language has the “OS library” to provide networking and filesystems (if needed), any app written to that library can be compiled to a unikernel.

The unikernel, essentially, gives up composability in the pursuit of speed. Additional process can’t be added in, because the unikernel doesn’t support processes or process isolation without losing its essential difference from an ordinary OS or container.  (Threads are possible, but not processes.)

Perhaps, again, the difference is best described with a picture:


In the unikernel, what are ordinarily “operating system services” and shared library calls are compiled into a single address space as ordinary function calls.  Getting rid of the user/system split is of course their whole point, and shared libraries also disappear because, as one process, there is no “other” process to share with.  All the mechanics involved in doing so become unnecessary.

As noted above, hypervisors have made unikernels much more interesting and feasible lately.  There’s a full list at unikernel.org, particularly their projects page, but the two that look most interesting to me are:

I don’t know when—or even if—I’ll get around to doing anything with them, but those are likely to be the easiest-to-use unikernels, unless someone makes node.js into one.

Sunday, February 14, 2016

API Gateway as an HTTP Service Proxy: Lessons Learned

At work, we’re finishing the implementation of our first API using the Amazon API Gateway for decoupling the API key management, logging, and throttling from the actual backend service. This also marks our first OAuth 2.0 Resource Server, and makes heavier use of Swagger 2.0 for the entire pipeline.  With all these “firsts,” I’d like to share a few notes on our setup.

Tuesday, January 19, 2016

Freezing Layers (static vs dynamic)

Let me warn you right up front: this is a philosophical and rambling post.  I'm trying to pull together a series of observations into a coherent structure and it's quite possible that I failed at that.  In addition, because of its lack of quality, “PHP” throughout refers to the 5.x and earlier lines.  Version 7.0 basically addresses the error complaint.  Now without further ado, here is your ancient, rambling post...

I've written some Perl code using FCGI managed by mod_fcgid, in which I constructed my own request loop and application pre-loader.  (And complained about the way mod_fcgid is set up, since it doesn't pre-fork fastcgi children—whenever concurrency is exceeded, starting from zero after a fresh startup, someone has to wait for the world to load just like the bad old CGI days.)

I've written plenty of PHP as well, where you get to generate a response in response to routing once the server environment gets around to delivering the request to you.  The only things that may persist from request to request are allocated on the C level, not by the PHP script directly: most famously, opcode caches and persistent database connections.


Thursday, January 14, 2016

The Common Denominator

On my back-burner at work, following a company-wide deprecation of Perl, is porting our Perl-based API services to PHP.

This may be painful, because by the time I wrote the API code, I was fluently using Perl-isms in Perl code.  It’s possible this won’t be a straightforward port because most of that is missing in PHP.

In comparison, back when I first started working here, I ported a small sub-site from Perl to PHP.  It was written in a style where I mostly needed small, mechanical changes like taking out my.  The only “big” change was creating an equivalent for HTML::Template, but that was pretty small itself.

Maybe you can write Java in any language… but then it’s 10 times easier to port to another language.