More Amazon S3 Thoughts

I’ve been preparing to write the direct-to-S3 site backup script that I talked about before. It really shouldn’t take long – probably only an hour or two, maybe a bit more because I can’t do it uninterrupted – but when approaching any new technology I like to do a reasonably thorough survey of the tools and techniques and issues. This comes from having been burned too many times by seeing others jump into something too fast, unaware that there were better approaches than the ones they were using, but by the time I came along it was too late because too much code was already written. Here are some observations for others who might choose to go down this path.

The limitation to a single naming-hierarchy level (bucket/name instead of directory/directory/…/name) is slightly annoying. More annoying still is the lack of a rename, so if you want to change the name of a large file/object you must delete it and re-upload with the new name. If you’re just maintaining the slashes in paths of files you upload from multiple directories elsewhere, changing a file’s location must be handled the same way. The only way around this is to give files permanent unique names and maintain some kind of directory yourself. This works OK for a backup type of application except that it violates Jeremy Zawodny’s “can I easily find my backed-up files even if I’ve lost the program that put them there?” criterion, and it doesn’t work at all with Amazon’s virtual web-hosting facility. I can understand the lack of a rename in the REST API, but the lack of such an obviously important feature even in the SOAP API makes me wonder if they made the mistake of basing an object’s location on its name so that a rename implies relocating the object’s data. No matter what the reason, this is one of the biggest warts on S3.

Speaking of the virtual-hosting facility, it’s nice but you do lose some functionality compared to what you’d have on a regular website. For example, the logging facilities are rudimentary and hard to access (more about this in a moment). You also lose any kind of functionality that you might be used to from .htaccess under Apache. I’d like to host my Australia and platypus pictures through Amazon, but I use my .htaccess to refuse links from web forums and such because I don’t want to pay for extra bandwidth just so some teenager can post my wombat-poop picture to a thread or (worse) use one of my platypus pictures as an avatar. Just one such usage could easily generate thousands of hits and up to a gigabyte of consumed bandwidth, so losing the Referer filtering could actually cost me money.

What was that about “rudimentary and hard to access”? Part of that is the state of the tools available to monitor, maintain, and generally tinker with S3. I used NS3 Manager for a while (it’s how I’ve uploaded most of my pictures and videos so far) but it’s not very stable and it leaves annoying little “placeholder” files everywhere for no apparent reason. Jets3t Cockpit seems a lot better, but still doesn’t support all of the S3 functionality such as logging. Anyone working with S3 needs to get used to doing a lot of things the hard way, which fortunately is not all that hard but can be a bit tedious nonetheless.

Be wary of web-based tools such as Cockpit Online, Openfount S3 Manager, or AWS Zone. I have no particular reason to distrust or cast aspersions on any of their authors, who are after all doing the community a service by providing these tools, but bear in mind that to use them you must provide your “secret” S3 key to a third party you don’t know. Caution is always called for in such cases.

If you’re using PHP, there really doesn’t seem to be much difference between neurofuzzy’s library and the semi-official Amazon library. One difference might be crucial, however, and that’s support for streaming uploads instead of having to load an entire file into memory to transfer it. Many web hosts would not appreciate loading a gigabyte file into memory from a script, and in fact it often won’t even work. Fortunately, Mission Data has published a patch to the Amazon library to do streaming transfers. This, combined with Christopher Shepherd’s S3 backup script, will probably form the basis for my own site-backup tool. The libraries available for Ruby actually seem a bit better, and using them might be a good way to learn Ruby, but Ruby support is simply less ubiquitous than PHP support and this would make my script a bit less generally useful. Maybe, if I have enough spare time, I’ll do both PHP and Ruby versions just for fun.

Send Jacoby to Iraq

Jeff Jacoby should stick to plagiarism. It’s not that I endorse the nasty habit for which he is best known, but it’s far from his nastiest habit. When he tries to be original, he manages to squirt out material so mind-bogglingly stupid that a little relatively-honest plagiarism would be a relief. Consider this example from today’s Boston Globe.

Far from drawing down the number of troops in Iraq, Bush should increase them. “The Rumsfeldian “light footprint” theory – the belief that the US military presence in IRaq must be minimized so that the Iraqis learn to maintain security and stability on their own – has been tried for more than three years. It hasn’t worked. At least in the short term, there is no prospect of restoring order and stopping the bloodshed without many more American boots on the ground.

He’s awfully keen on the idea of a heavier foot on the necks of the Iraqi people, especially since it’s somebody else’s foot. I suggest that we should increase our troops in Iraq by exactly one. I don’t know if the military is willing to lower their standards that far, but maybe they could find a use even for such a worthless waste of flesh as Jacoby. I suggest assigning him to test parachute-less airborne troop deployment.


Lately I’ve been getting a lot of spam with the text hidden in images much like the following.


Notice the “smudges” and the wonky multicolored text. Ironically, I suspect that the image is generated by a CAPTCHA program intended to stop website (e.g. comment or forum) spam, which is being perverted by the spammers themselves to evade email filters. Does anyone know anything more about this trick?

Picture Friday

Here are some pictures from October.