Saturday, July 9, 2016

A hat beside the “PHP7 vs HHVM” ring

I spent all my free time, all week, exploring the difference in performance between PHP 5.6.22, PHP 7.0.8, and HHVM 3.14.1.  (HHVM 3.14.2 came out later in the week, but I didn’t want to redo everything.)

In the end, it turned out kind of useless as a “real world” test.  The server doesn’t have L3 cache, and the client is a virtual machine.  I also didn’t run a large number of trials, nor tune the server stacks for optimum performance. In fact, the tunings I tried had no effect, at best.

tl;dr conclusions:

  • HHVM proxygen is almost completely amazing.  If you have the option, it is probably what you want. It just crashes hard when concurrency exceeds its open file limit.
  • nginx+FastCGI hits a resource limit somewhere, and starts failing requests at higher concurrency, like 160.
  • apache+FastCGI does the same… but the limit is higher, between 256 and 384. The price for this is that it’s only 86% as many requests per second.
  • Providing more FastCGI workers makes the errors hit sooner, but ramp up more slowly.
  • I’m really disappointed in nginx.  I expected more.  A lot more.



Test Setup

The server is an AMD Phenom II X4 840 (3.2 GHz Propus) desktop with 4 GB of RAM and Debian 8 (jessie) installed.  As noted above, this chip is actually a Deneb core without L3 cache, which I just (re?)learned during this writeup.

The client is an Intel Core i5-3550 (3.3 GHz Ivy Bridge) desktop with 8 GB of RAM, running Windows 10 and VirtualBox 5.0.20, with all possible acceleration features enabled.  This hardware runs a Debian 8 guest, with bridged networking and virtio-net driver, where the ab tool is run.  (NAT was observed to have overly broad latency distribution.)

Both of them are wired via Cat5e to gigabit Ethernet ports on an Asus RT-N66U.

Design

I used ab to test the index page of a fresh Drupal 7.44 site with 3 articles.  Response size was about 14000 bytes (actual varies slightly, depending on server headers.) I enabled the Drupal page cache for anonymous users, and the MySQL query cache was also enabled.

This was served using HHVM 3.14.1 in either Proxygen or FastCGI modes; PHP-FPM 7.0.8 compiled with -O2 -pipe as CFLAGS; or Debian’s PHP-FPM 5.6.22 distribution.

HHVM FastCGI used nginx exclusively.  PHP-FPM was tested on nginx with 8 request processing children.  PHP 7 was also tested with 8, 32, and 256 children available for request processing on nginx, and with 32 children on Apache event and prefork MPMs.

nginx had 4 worker processes (1/core), 768 worker connections, with multi_accept off by default.  (I would later try 4 processes with 3000 connections, an open files limit of 3072, and multi_accept on, but this failed to improve response, at which point I abandoned further exploration.)

Apache with the Event MPM used the Debian default settings of 25 threads/child, 25 min spare threads, 75 max spare threads, limit of 150 workers, and 2 processes (50 threads) created at startup.

Apache with the Prefork MPM likewise used Debian default settings, of 3 min spare servers, 8 max spare servers, limit of 150 server processes, and starting 5 servers at startup.

All concurrencies ran 100,000 requests, taking 54 seconds and up.  Keep-alive did not result in a major difference in early testing, so I ignored it in later tests, in favor of more broad testing.

The server was left to quiet for at least 30 seconds between tests, and each run was pre-warmed with 500 requests.

Results round 1: Initial PHP runs

PHP 5, tested at concurrencies of 1, 4, 16, 64, and 256, reached its peak requests/second at 16 (1059.09) and 64 (1057.62).  Worst-case service time for these levels were 34 and 79 ms, respectively.  100% of requests returned successfully up to and including concurrency 64; just 6% of requests were successful at 256.

PHP 7 reached a peak request rate at concurrency 16 (1790.08) and was at 95% of that at 64 (1701.39).  It was able to serve 12% of requests successfully at concurrency 256.  Worst-case service time for these concurrencies were 33 and 72 ms, which places it basically identical to PHP 5, but at 69% better request rate.

I stopped to explore the edge-case behavior at this point.

PHP-FPM 7.0.8, 8 workers, nginx
ConcurrencySuccessfulFailedSuccess %
1351000000100%
1379939560599.4%
138738772612373.9%
139553124468855.3%
146242757572524.3%
256115848841611.6%

It seems that errors in this setup approach the ceiling asymptotically.

Results round 2: HHVM

With HHVM in Proxygen mode, I wasn’t able to get a single request to fail, except when I set concurrency above the number of file descriptors it could open. In that case, the process aborted, and required external intervention to restart.  This was the only setup to exhibit this behavior; the FastCGI configurations would recover from their error conditions on their own, soon after traffic dropped.

However, with maximum file descriptors raised to 4096 and the server restarted, HHVM was still happy to serve 4000 simultaneous connections (1000/core)… just a bit slowly.  Median service time was 2.17 seconds, with a worst case of 3.34 seconds.

Proxygen scales linearly all the way up to eleven.

Proxygen steadily served about 1565 requests per second from concurrency 16 through 2000 (the usual steps, then 768, 1024, 1536, and 2000 as I continued to become more amazed), then accelerated to 1821 for concurrency 4000. Discounting the outlier at 4000, this puts Proxygen in the area of 48% faster than PHP 5.

In FastCGI mode, receiving requests from nginx, HHVM only handled 1462.09 requests per second (concurrency 16) or 1451.39 (64).  This still puts it at 37–38% faster than PHP 5.

At 256, the lengths didn’t match and ab counted that as 99999 failures, but only 1 non-2xx response.  I can’t report numbers from that run, since it’s clearly garbage.

Results round 3: PHP 7 with more request workers

Quadrupling PHP FastCGI children to 32 (8/core) made nginx able to serve 1820.85 requests per second at concurrency 64, with 209 ms worst-case service time.  It also reduced the number of errors at higher concurrency, with 79% of requests at 256 being successful.  However, failures had still started at 160.

Raising the limit again, to 256, was a resounding failure.  Concurrency 256 served 68% of requests successfully, and testing this configuration was cut short after a trial at concurrency 96 returned 94% success rate.  256 workers is clearly not a viable configuration for production use.

Results round 4: PHP 7 on Apache

With request workers reverted to 32, it was time to pit Apache against nginx.

Requests per second dropped to the 1500s—1507.46 at concurrency 16, 1557.17 at 64, or 1575.38 at 256… without errors.  Worst-case service times were 105 ms, 258 ms, and 3.46 s, respectively.  At concurrency of 500, ab quit running after 5168 requests.  This means that the Apache event MPM handles 86% of the requests per second of nginx (nginx is 17% faster) in an apples-to-apples comparison at 64 concurrency, but Apache handles far more concurrency—256 concurrent clients can be serviced error-free, where nginx has started issuing errors by 160.

In a test of the prefork MPM, requests per second improved, peaking at concurrencies of 64 (1642.60) and 256 (1652.00), but service times were a bit less stable, and the worst case hit 336 ms for 64, and 6.78 seconds for concurrency 256.

Finally, out of curiosity, I changed PHP children back to 256 and tried the tests on Apache’s event MPM, but errors started occurring at concurrency 64. Concurrency 256 was only able to serve 49% of requests successfully.  Like nginx, this is a configuration that wouldn’t work in production.

Bonus round: siege

I did run a preliminary test against a completely new parameter set: a more-populated Drupal database (thanks to the devel module’s content generation function), using siege to visit more nodes and page assets.  The anonymous page cache was disabled in order to stress the non-network side of the stack more, and nginx was re-tuned in the hopes of higher concurrency.

Namely, I raised the open file limit, increased worker connections to 3000, and turned multi_accept on.  PHP was returned to 32 FastCGI children.

MySQL was also tuned to have unsafe commits.  Prior tests revealed that at least one fsync() per Drupal page is dreadfully slow.  (That problem was ameliorated by unsafe commits, but to avoid invalidating all of my data, I put safe commits back and fixed the actual problem by clearing the Drupal cache.)

However, this setup still handled concurrency 128 fine, then rolled on its back and failed requests at 160, so I didn’t explore this path in any more depth.  (I don’t know if it’s interesting, but disabling the page cache dropped performance to just 188 requests per second, even with unsafe commit and the query cache.)

Conclusions

Proxygen is awesome.  I bet that’s what mode Facebook is using for Facebook. That team deserves an award.

We’re running Apache2 prefork at work, because of bugs in the event MPM at some point (Ubuntu 14.10 days perhaps?) It seems like a good choice.  We’re trading a bit of performance and service time stability for lack of errors, the latter of which I consider absolutely vital.

nginx turned in a surprisingly poor showing.  Considering the amount of bragging they do and the amount of hype they have, I expected a more decisive victory than this “it’s a bit faster, until it catches fire” deal.

Tuning for more concurrency is hard.  Nothing I did actually helped.

Limitations

  • These are not averaged results from multiple runs.
  • The client is virtualized.
  • The network was not always 100% peaceful.  I don’t have a spare gigabit switch.
  • The server CPU has no L3 cache.
  • These are not heavily tuned servers and kernels.
  • These are not entirely untuned servers.
  • MySQL was running on Drupal’s localhost.

YMMV.

Software versions

  • HHVM: 3.14.1, from their official repository for Debian 8.
  • PHP 5: 5.6.22, from the Debian repository.
  • PHP 7: 7.0.8, from source at php.net.  Compiled with CFLAGS="-O2 -pipe".
  • nginx: 1.6.2, from the Debian repository.  1 worker/core.
  • Apache: 2.4.10, from the Debian repository.  Event and Prefork MPMs used.
  • Drupal: 7.44, from source at drupal.org.  Anonymous page cache enabled.
  • MySQL: 5.5.43, from the Debian repository.  Query cache enabled.

No comments: