Thursday, May 23, 2013

A Look at Failing Polymorphism

First, a brief recap of Steve Yegge's "When Polymorphism Fails" in case it moves again. Let's skip to the part where he builds an example in which polymorphism doesn't help the programmer, using a user-extensible online game as the premise.  A user wants to add an OpinionatedElf monster; how does she do it?
Let's say the OpinionatedElf's sole purpose in life is to proclaim whether it likes other monsters or not. It sits on your shoulder, and whenever you run into, say, an Orc, it screams bloodthirstily: "I hate Orcs!!! Aaaaaargh!!!" (This, incidentally, is how I feel about C++.)

The polymorphic approach to this problem is simple: go through every one of your 150 monsters and add a doesMrOpinionatedElfHateYou() method.

Gosh. That sounds insanely stupid.
Yegge then compares that to an alternative, implemented within the OpinionatedElf (who is the one doing the judgement in the first place):
public boolean doesElfLikeIt ( Monster mon )
{
    if ( mon instanceof Orc ) { return false; }
    if ( mon instanceof Elf ) { return true; }
    ... <repeat 150 times>
}
There's also the viable-in-Ruby approach of opening all the classes and adding the doesElfLikeIt method to each of the 150 monsters. But still, there's a lingering problem with all of the approaches so far:
If someone adds a Gremlin, your elf will be stuck screaming something like "Gosh, what's THAT?" until you can update his code to include a case for gremlins.
I ran into my own instance of this problem recently.

Saturday, April 13, 2013

Tor for VirtualBox Guests

To give a guest only one network card with host-only networking, yet still let it access the Internet, we can let it connect to an HTTP proxy running on the host.  If this proxy is polipo, we can configure it to connect to tor’s SOCKS server as its upstream:



Then what happens?  Any and all Internet traffic from the guest VM is delivered via Tor.  Since the guest doesn't have Internet access of its own, any software which doesn't cooperate with the proxy cannot communicate.  Although malware on a compromised guest could still exfiltrate data, it hides the host's true external IP address from the malware.  (Assuming, dangerously, no security bugs in polipo nor VirtualBox that would allow a compromise of the host.)

I said that first, but maybe not very clearly, on twitter.

But… given an appropriate proxy, traffic can be forwarded over any transport.  A proxy could accept data from the guest and transmit it via VPN.  On the other hand, building a VPN client into VirtualBox to offer a VPN network type would let a client connect to a VPN without necessarily allowing other host processes access to it, nor requiring the VPN to be mediated by an additional (dual-homed) guest.

Polipo doesn't have to use tor as a backend, either; it's also perfectly capable of forwarding using ssh's SOCKS proxy.  (This is known as "dynamic tunnel mode" in some clients.)  Compared to the amount of software and configuration needed to set up the average VPN, ssh is just as secure and much easier to get running.

Sending traffic via proxy is an effective way to apply further modifications to the destination stream, without needing the cooperation of software connecting to the proxy.

Friday, April 12, 2013

Random Idea: BBU SSD

Given that RAM doesn't seem to be that expensive anymore (there are plenty of choices for 2GB DDR3 sticks around $20 at NewEgg right now), why not put a bit more onboard an SSD along with a small rechargeable battery?

The extended RAM would be dedicated to a large page cache, which would do its best to hold frequently-written data (to extend the flash life by avoiding writes to it for as long as possible.)  A landing zone the size of this cache would be reserved in the NAND, and in event of sudden power loss, the pages pending in the DRAM would be dumped into the landing zone under battery power.

Presumably, the 400+ MB/s that current SSDs quote for sequential writes involve the overhead of the OS, host interface, and wear-leveling scheme, and represents a lower bound on performance of the panic save.  On a 2A drive writing 2.00 GB, also intended to be conservative numbers, 5.12 seconds of power is required, for consumption of 170 mAh.

(If the manufacturer quoted their transfer speed in SI and I'm actually writing GiB of data, those numbers change to 382 MiB/s yielding 5.37 seconds of transfer time and 179 mAh of power.  No big deal.)

Wednesday, April 10, 2013

Lost in the Complexity

I was working on a post about VirtualBox’s networking capabilities and how none of the modes provided for what I wanted out-of-the-box.  But the tirade was interrupted by a simple thought: VirtualBox allows up to four virtual network cards per guest.  I could simply configure a guest with two of them—one connected to NAT for reaching the outside world, and the other connected to host-only networking so I could reach it without having to set up port mapping rules.  (Bridge mode is unsuitable because I want the machine to be externally invisible; also, the LAN is DHCP and I want the machine to have a static IP without involving anyone else.)

That turned out to work, by the way.  The machine still has access to the Internet but also nmap against its static (host-only) IP can see all the open services at their native, unmapped ports.

In the moment I realized that a dual-card configuration would work, I was also struck by the amount of time I had spent coming up with a single-card solution to the external access problem… only to have it turn out to be the wrong problem to be solving.  Or, since it wasn’t technically infeasible, a problem made over-complex by the accidental assumption of a single network.

This illuminates one of the main problems of programming: the tension between breadth and depth.  To determine if a plan is technically feasible, one needs to dive deeply into all the details and try to fit the final product together in their mind.  But, the feasibility alone is not a fitness function.  One must avoid getting so lost in the details that this becomes the only approach visible, and actively “back out” to search for hidden assumptions and gratuitous decoupling.

As a younger programmer, I spent some happy hours working on database abstraction layers, and the projects never changed database.  These were all in-house projects for in-house purposes where all available (and foreseeable) DBA knowledge was built on MySQL.  Building systems that “could” be changed to other ANSI compliant systems was both irrelevant and unnecessarily limiting.  I didn’t allow any MySQL specific optimizations, so that all queries could be represented faithfully on any DBMS.

However, the Serendipity weblog system can run on MySQL or Postgres and for them, it isn’t gratuitous.  Their software is externally distributed and not every admin using the software will necessarily be either conversant with or favorable toward MySQL.  Thus, Serendipity’s user base becomes larger if it has support for other engines.  The same decoupling, but no longer gratuitous, and they probably implemented it better than me anyway.

When the VirtualBox Network Quest began, I made the assumption that I wanted one network, and because that assumption was invisible to me, I chased the details down to completion before spotting the alternative.

OTOH, thinking so deeply about it led to a couple of other interesting observations, but they'll have to wait for another post.

Wednesday, March 13, 2013

PHP unpack()

In Perl, unpack is pretty easy to use.  You unpack something with the format string used to pack it, and you get a list of the values that were packed.  I'm not sure the historical reasoning behind PHP's version of unpack, but they certainly made it as horrible as it could possibly be.

To get Perl-like behavior, the simplest path appears to be:
<?php list(...) = array_values(unpack("vlen/Vcrc", $header)); ?>
Instead of the simple "vV" it would be in Perl, you give each format code a unique name and separate them with slashes.  You have to provide a name and not an index because PHP interprets a number as the repeat count.  There's nowhere to place an index in the unpack format.  Then, array_values() gives you back the items in the order specified in the unpack string, since PHP associative arrays maintain their ordering.  Finally, the field names must be unique, or else unpack will happily overwrite them.

If you try to use "vV" as the format code, there will only be one value unpacked... named "V".  If you try "v/V", there will be second value... at index 1, where it overwrote the first value.

If you're unpacking just one value, you might try to write list($x) = unpack(...) but this won't work—pack inexplicably returns a 1-based array.  PHP will generate a notice that index 0 doesn't exist and assign NULL to $x.

Saturday, March 9, 2013

Theming Drupal 7: Block vs Node vs Field et al

What is the difference among pages, regions, nodes, blocks, and fields?

There's one page that may contain multiple regions.  Regions are containers for nodes and/or blocks.  Multiple nodes can appear within a region if the URL being accessed is a listing view, in which case the nodes will be rendered in teaser form with "Read More" links leading to the full view.  Multiple blocks can also appear in a region.

Blocks seem to correspond to 'widgets' in other systems: chunks of content that can be dropped into the sidebar and remain fairly static on many pages.  For instance, Drupal's search box is a block, and it lands in the sidebar_first region of both default node types.  Blocks do not have fields.

Nodes, on the other hand, correspond to the content that actually gets added to the CMS.  Blog posts, static pages, whatever.  Nodes get URLs assigned to them, either through the URL aliasing features, or the default node/4 style path.  (Or through any SEO friendly URL generation modules you may have installed.)  Nodes have a type and node types have fields.  Fields receive values per-node that are displayed on the node type.

Everything related to a node is rendered inside a single container, corresponding to the page content variable.  Overall, the URL, node with its type and fields, and page content variable are all one interrelated thing when viewing a single node.  Pages and regions are related to the theme as a whole.  Blocks are strongly related to a theme, but also customizable based on node types through the block's Configure link.

Blocks, nodes, and fields are fairly customizable and appear in the admin interface under the "Structure" menu item.  Controlling blocks is done with the Blocks item in the menu (straightforward enough); controlling nodes is done through Content Types.

I should probably note that the paths through the admin interface given are for the default setup where Bartik is the main theme and Seven is the administration overlay theme.  Those paths are not guaranteed for other themes, since themes are PHP and can do nearly anything they want.

If you want "a content area" with several "pieces" to it, the path of least resistance is to construct node.tpl.php with the contents of that content area inside, using fields to display each individual "piece" desired.  Then in your admin interface, establish the fields so they show up when editing the page.

To make that clearer, let's say MyCorp wants a video on their front page with a blurb to the side, a graphical separator, and another couple paragraphs below.  I could make a mycorp_video content type, and add two fields (field_video_embed and field_video_blurb), then create a node--mycorp-video.tpl.php file with the container divs, central bar, and calls to <?php render($content['field_video_embed']); ?> inside their respective containers.  Then I could leave the couple of paragraphs as "body" content and print that below the separator.  Once the template is ready, the node type can be created in the admin interface, the node actually added, and finally set to be the front page of the site.

Controlling something outside of the content area based on the content (node) type is not possible by default, but can be done with an override in the template.php file for the theme:

function themeName_preprocess_page(&$vars, $hook) {
  if (isset($vars['node'])) {
    /* If the node type is "blog_madness" the template suggestion will be "page--blog-madness.tpl.php". */
    $vars['theme_hook_suggestions'][] = 'page__'. $vars['node']->type;
  }
}

The above code was posted by JamieR at drupal.org/node/1089656.  By default, drupal's page renderer only knows about specific nodes (like page--node--4.tpl.php) and not content types (aka node types) in general, which is what this override adds.

A second approach is to use the CCK Blocks module to convert fields into blocks.  This allows them to appear on the block layout and be placed in regions in spite of being node-specific.  The blocks are then made visible in region templates with a  cck_blocks_field prefix, for instance cck_blocks_field_video_embed for a video_embed field.

The latter approach is actually the one I ended up taking.  I needed to handle several optional areas in various combinations.  Instead of a big list of node types and duplication of the markup for any fields shared between types, I have two basic node types and regions handle sharing the markup and displaying the available fields (or nothing, when no fields are set.)

Monday, February 11, 2013

EC2 utilities vs. $AWS_CREDENTIAL_FILE

Most of the AWS command line tools accept their login credentials from a file named in the AWS_CREDENTIAL_FILE environment variable and formatted like so:

AWSAccessKeyId=AKIAEXAMPLE
AWSSecretKey=Base64FlavoredText/Example


The EC2 tools predate this scheme and still refuse to use it, preferring the credentials to be set directly in the environment.  I decided to over-engineer it and pull the EC2 environment variables from the file:

export AWS_ACCESS_KEY=$(grep '^AWSAccessKeyId' "$AWS_CREDENTIAL_FILE" | cut -d= -f2)
export AWS_SECRET_KEY=$(grep '^AWSSecretKey'   "$AWS_CREDENTIAL_FILE" | cut -d= -f2)


(Those will probably wrap on blogger; in code, they're two lines, each beginning with "export".)  Now I can put the credentials in one place, and they're available to all of the tools.