Tuesday, October 25, 2011

Character Sets: Get PHP, Perl, MySQL, and Unicode to Play Together

This post is a companion to Perl and Unicode in Brief, an attempt to cover the same ground more concisely.

This is an extended remix of my recent post on the subject, only less of a rambling story and more focused.  Again, I'll start with some background definitions.

I'll also assume that you're going to make everything UTF-8, because as a US-centric American who has the luxury of using English, that's what makes the most sense for my systems.  However, if you understand everything I wrote, it should not be difficult to make everything UTF-16 or any other encoding you desire.


Friday, October 21, 2011

The Trouble with REST

Note: this post has been superseded.

REST is easy to describe.  It goes a little something like this: "You have some representation, and you send (or receive) the whole thing to read it or make changes."  People coming from Clojure would understand it as REST sends values.  I can GET an object, receive the value, manipulate it, and PUT the new value.  It's so easy because it just uses HTTP!

Right?  Maybe not.  If REST is so easy, why is there HATEOAS*?  Shouldn't that have been obvious?  Why do we have arguments about versioning and parameters and formats and headers on Reddit?


Wednesday, October 19, 2011

Notes on using mysqlbinlog for copying updates

I commented on this post, but for posterity:
It seems by sheer luck that I stumbled over a way to take care of everything. I save a copy of the interpreted binlog as it files through the pipe:

mysqlbinlog ... | tee binlog-play.sql | mysql ...

Then if I get an error message, mysql will tell me e.g. "Error ... at line 42100". Running "vim +42100 binlog-play.sql" lets me inspect the stream to see what went wrong in detail.

Inside binlog-play.sql, the "#at 112294949" comments can be used in e.g. "--start-position=112294949" to the next mysqlbinlog command, to retry the statement after I fix the problem. (Alternatively end_pos seems to tell the position of the next command, if I need to skip the one which failed, e.g. I was testing out CREATE FUNCTION and it was logged as "CREATE DEFINER=... FUNCTION" which RDS refuses.)

The final piece of the puzzle is that executing "FLUSH LOGS;" or "mysqladmin flush-logs" will push mysqld on to the next binlog file, so you can safely play out the one you want. Once you've finished processing a file through mysqlbinlog, you can just remember the file boundary, and flush mysql's logs if you want to process the one it's presently writing to.
This is in regards to piping mysqlbinlog output from one mysql server into the mysql client to execute on another; the post I linked above discusses doing so for switching to Amazon RDS.  The basic strategy is to minimize downtime by loading a database dump from the source on the destination, then use mysqlbinlog on the source and the mysql client to feed updates from the source to the destination.  The updates can be faster to load than a new dump; and when it's time to switch servers, it's a matter of stopping database clients, turning off the source mysqld, sending the final binlog updates, pointing the clients to the destination server, and turning the clients back on.  As opposed to waiting for a whole dump to load while the clients are off.

Tuesday, October 18, 2011

Character Sets, Encodings, MySQL, and your data

This post is a companion to Perl and Unicode in Brief, an attempt to cover similar ground more concisely.  And this post is a revised version of the one you're currently reading.

I'm currently moving data from a (relatively old now) MySQL 5.0 server into Amazon RDS.  I've been here before, when I was moving data from MySQL 4.x into 5.0 and mangling character sets.  This time, I want to make 100% sure everything comes across with maximum fidelity, and also get the character encoding as stored to be labeled correctly in MySQL.

First, a quick definition or two:
  • Character Set: a specific table to translate between characters and numbers.  Example: ASCII defines characters for numbers 0-127; "A" is 65.  This can also be described as "a set of characters, and their corresponding representation inside the computer."
  • Character Encoding: a means of "packing" numbers from the character set into a container.  Example: UTF-8.  The Unicode character 0x2013 becomes 0xE2,80,99. The "E" signifies "Part 1 of 3", and part of the remaining bytes simply indicate "Continued"; the 0x2013 is then divided up to fit in the parts of the bytes that aren't indicating their "Part 1" or "Continued" status.  In the specific case of UTF-8, the encoding is designed so that the ASCII range 0-127 (0x00-7F) is encoded without change: a leading 0-7 means "Part 1 of 1".
  • 8-bit character encoding: In older, simpler days, character sets defined only as many characters as could fit in 8 bits, and defined the encoding as simply the numbers.  Character number 181 would encode as a byte (8 bits) with value 181.
  • A character encoding implies the associated character set, because the encoding defines how numbers in its character set become individual bytes.  How characters in other sets would be encoded is left undefined and basically impossible.
This last point is why MySQL lets you set "character sets" to UTF-8, though the latter is an encoding.


Tuesday, October 11, 2011

iPad vs. Tablet PC

One of them succumbed to death by risk-aversion.

One of them couldn't let go of the tether and fly.

I think Linus said the same of svn: paraphrased, "If you're trying to make 'a better CVS' then you have already lost, because CVS is too broken to fix."

Hey, sapphirepaw: make sure what you do is good on its own, not "an X only different".

Saturday, October 8, 2011

Steve Jobs

I'm getting old: if I were to pass on at the same age Jobs did, my life would be more than half over already.

What separates me from Jobs?  There's the matter of leverage, where he could take his vision and coordinate the prototyping and development of it, into the iPod, the iPhone, the Macbook Air, the iPad.  There's also the matter of having vision.

In 2006 or so, I beheld my first iPod in real life, an old (FireWire based) model with a physical click-wheel.  In 2008 I picked up a different, small MP3 player and for the first time, immediately noticed the limitations of digital control.  Without having handled the iPod and getting a feel for the analog response of the wheel, I probably wouldn't have given the buttons a second thought.  Do you want to scroll on the generic?  Click-click-click-click.  Or click-and-hold, guess at how long you need to go (since the screen is slow enough to be unreadable at this scrolling speed, and they don't slow updates to compensate), and release.

The point here is, Jobs saw humans as inherently analog, and adapted all of his machines to analog control.  It's a simple thing, but Jobs was apparently devoted to HCI.  The "vision" simply falls out of that.

It's not like the limitations of digital control weren't apparent in the 1980s.  Compare Rad Racer to a real car's steering wheel.  Anyone focused on "how it feels" could have been Jobs back then, inventing 2010 in the 16-bit era instead of carrying 8-bit paradigms through the 1990s.

In contrast, I seem to lack vision because I'm busy implementing arbitrarily complex business rules at work, and staying away from the bleeding edge of gadgetry.  I'm not in the consumer space; I'm not taking any research toward the consumer space; and I'm not thinking about what's next for it, either (at least, not beyond what turns out to actually be the next thing*.)  But, I'm also having little impact on the wider world, writing code that never leaves the house.  It's important, but after I am gone, will these be the best years of my life?  Will I think college was the best time of my life, forever?

I think it's time to put my free time to better use and do something instead of watching the world slowly develop towards Jobs' vision on its own.


* I have a dead draft which discusses the crazy idea of "having a set-top box inside the remote" in 2006 or so.  It then points out that h.264-over-wifi ought to handle the bandwidth to do exactly that from your iPhone now.  It starts fleshing out what would be necessary to make it happen, then abruptly ends with a note: "Two days after I started writing this, Apple announced AirPlay."

Thursday, October 6, 2011

Setting Everything on Fire

I created a new user, gave them wheel group, and in case I needed another admin user, added %wheel to sudoers through visudo.  Then, I was trying to do more stuff, and...

[sudo] password for ec2-user: _

Wait.  What?  Not only does ec2-user have no password, but I didn't change its NOPASSWD line in sudoers.

It turns out that ec2-user is also in group wheel, and when confronted with the two permission sets, sudo did what I didn't mean: applied the %wheel rule and started requiring passwords for ec2-user.  Of course su was no help either: root likewise has no password set, because you have sudo as ec2-user....