<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Canned Platypus</title>
	<atom:link href="http://pl.atyp.us/wordpress/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://pl.atyp.us/wordpress</link>
	<description>Making the world better, one byte at a time.</description>
	<lastBuildDate>Thu, 02 Sep 2010 17:30:50 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en-us</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>I Love You, But&#8230;</title>
		<link>http://pl.atyp.us/wordpress/?p=3026</link>
		<comments>http://pl.atyp.us/wordpress/?p=3026#comments</comments>
		<pubDate>Thu, 02 Sep 2010 17:30:50 +0000</pubDate>
		<dc:creator>Jeff Darcy</dc:creator>
				<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=3026</guid>
		<description><![CDATA[I love programming.  I love thinking about algorithms and data strucures.  I love writing code, rearranging code, talking about code.  I even love testing and debugging and documenting code.  (This is not to say I do all of these things as consistently as I can.  There are still only 24 [...]]]></description>
			<content:encoded><![CDATA[<p>I love programming.  I love thinking about algorithms and data strucures.  I love writing code, rearranging code, talking about code.  I even love testing and debugging and documenting code.  (This is not to say I do all of these things as consistently as I can.  There are still only 24 hours in a day and so one must prioritize.)  Sometimes I think of getting out of this field, though, because so much of working as a programmer nowadays has nothing to do with any of the things I love, and it seems to be getting worse.  Nobody loves meetings and bureaucracy and such, but that&#8217;s not what I&#8217;m talking about.</p>
<p>I hate spending half my time dealing with build systems, source-control systems, package managers, and such.  There are too many out there, they all suck, everybody has their favorite one and their favorite way of using it, and they&#8217;re not at all shy about ramming their preferences down your throat . . . which brings me to my real point.  I hate programmers.  Hot damn, but we are a <b>noxious</b> breed, aren&#8217;t we?  I&#8217;m tired of the backstabbing, the trashing each others&#8217; work, the holier-than-thou attitude from the GNU types, the rampant sexism, the bike-shedding, the endless effort to do and re-do all the fun stuff while dumping as much work as possible onto one&#8217;s peers, and on and on and on.  I know I&#8217;ve exemplified many of these sins myself, I don&#8217;t need anyone else to tell me that, but if I made it my life&#8217;s goal to be as much of a jerk as possible I&#8217;d still find myself outdone just about every day by people who aren&#8217;t even at their worst.</p>
<p>Of course, I don&#8217;t know what else I&#8217;d do that pays the bills, so you all are stuck with me, but come <b>on</b>, people.  Let&#8217;s stop sucking all the fun out of this profession.</p>
]]></content:encoded>
			<wfw:commentRss>http://pl.atyp.us/wordpress/?feed=rss2&amp;p=3026</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>How *NOT* to Lose Data</title>
		<link>http://pl.atyp.us/wordpress/?p=3020</link>
		<comments>http://pl.atyp.us/wordpress/?p=3020#comments</comments>
		<pubDate>Thu, 02 Sep 2010 01:42:30 +0000</pubDate>
		<dc:creator>Jeff Darcy</dc:creator>
				<category><![CDATA[distributed]]></category>
		<category><![CDATA[systems]]></category>

		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=3020</guid>
		<description><![CDATA[In my last post, I described several common data-loss scenarios and took people to task for what I feel is a very unbalanced view of the problem space.  It would be entirely fair for someone to say that it would be even more constructive for me to explain some ways to avoid those problems, [...]]]></description>
			<content:encoded><![CDATA[<p>In my last post, I described several common data-loss scenarios and took people to task for what I feel is a very unbalanced view of the problem space.  It would be entirely fair for someone to say that it would be even more constructive for me to explain some ways to <b>avoid</b> those problems, so here goes.</p>
<p>One of the most popular approaches to ensuring data protection is immutable and/or append-only files, using ideas that often go back to Seltzer et al&#8217;s <a href="http://www.eecs.harvard.edu/~margo/papers/usenix93/">log structured filesystem</a> paper in 1993.  One key justification for that seminal project was the observation that operating-system buffer/page caches absorb most reads, so the access pattern as it hits the filesystem is write-dominated and that&#8217;s the case for which the filesystem should be optimized.  We&#8217;ll get back to that point in a moment.  In such a log-oriented approach, writes are handled as simple appends to the latest in a series of logs.  Usually, the size of a single log file is capped, and when one log file fills up another is started.  When there are enough log files, old ones are combined or retired based on whether they contain updates that are still considered relevant &#8211; a process called compaction in several current projects, but also known by other names in other contexts.  Reads are handled by searching through the accumulated logs for updates which overlap with what the user requested.  Done naively, this could take linear time relative to the number of log entries present, so in practice the read path is often heavily optimized using Bloom filters and other techniques so it can actually be quite efficient.  This leads me to a couple of tangential observations about how such solutions are neither as novel nor as complete as some of their more strident champions would have you believe.</p>
<ul>
<li>The general outline described above is pretty much exactly what Steven LeBrun and I came up with in 2003/2004, to handle &#8220;timeline&#8221; data in Revivio&#8217;s continuous data protection system.  This predates the publication of details about <a href="http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html">Dynamo</a> in 2007, and therefore all of Dynamo&#8217;s currently-popular descendants as well.</li>
<li>Some people seem to act as though immutable files are always and everywhere superior to update-in-place solutions (including soft updates or COW), apparently unaware that they&#8217;re just making the complexity of update-in-place Somebody Else&#8217;s Problem.  When you&#8217;re creating and deleting all those immutable files within a finite pool of mutable disk blocks, somebody else &#8211; i.e. the filesystem &#8211; has to handle all of the space reclamation/reuse issues for you, and they do so with update-in-place.</li>
</ul>
<p>Despite those caveats, the log-oriented approach can be totally awesome and designers should generally consider it first especially when lookups are by a single key in a flat namespace.  You could theoretically handle multiple keys by creating separate sets of Bloom filters etc. for each key, but that can quickly become unwieldy.  It also makes writes less efficient, and &#8211; as noted previously &#8211; write efficiency is one of the key justifications for this approach in the first place.  At some point, or for some situations, a different solution might be called for.</p>
<p>The other common approach to data protection is copy on write or COW (as represented by <a href="http://www.netapp.com/library/tr/3002.pdf">WAFL</a>, <a href="http://www.sun.com/2004-0914/feature/">ZFS</a>, or <a href="http://lkml.org/lkml/2007/6/12/242">btrfs</a>) or its close cousin <a href="http://www.usenix.org/publications/library/proceedings/usenix99/full_papers/mckusick/mckusick.pdf">soft updates</a>.  In these approaches, blocks are updated in place, but with very careful attention paid to where and/or when individual block updates actually hit disk.  Most commonly, all blocks are either explicitly or implicitly related as parts of a tree.  Updates occur from leaves to root, copying old blocks into newly allocated space and then modifying the new copies.  Ultimately all of this new space is spliced into the filesystem with an atomic update at the root &#8211; the superblock in a filesystem.  It&#8217;s contention either at the root or on the way up to it that accounts for much of the complexity in such systems, and for many of the differences between them.  The soft-update approach diverges from this model by doing more updates in place instead of into newly allocated space, avoiding the issue of contention at the root but requiring even more careful attention to write ordering.  Here are a few more notes.</p>
<ul>
<li>When writes are into newly allocated space, and the allocator generally allocates seqential blocks, the at-disk access pattern can be strongly sequential just as with the more explicitly log-oriented approach.</li>
<li>The COW approach lends itself to very efficient snapshots, because each successive version of the superblock (or equivalent) represents a whole state of the filesystem at some point in time.  Garbage collection becomes quite complicated as a result, but the complexity seems well worth it.</li>
<li>There&#8217;s a very important optimization that can be made sometimes when a write is wholly contained within a single already-allocated block.  In this case, that one block can simply be updated in place and you can skip a lot of the toward-the-root rigamarole.  I should apply this technique to VoldFS.  Unfortunately, it doesn&#8217;t apply if you have to update mtime or if you&#8217;re at a level where &#8220;torn writes&#8221; (something I forgot to mention in my &#8220;how to lose data&#8221; post) are a concern.</li>
</ul>
<p>It&#8217;s worth noting also that, especially in a distributed environment, these approaches can be combined.  For example, VoldFS itself uses a COW approach but most of the actual or candidate data stores from which it allocates its blocks are themselves more log-oriented.  As always it&#8217;s horses for courses, and different systems &#8211; or even different parts of the same system &#8211; might be best served by different approaches.  That&#8217;s why I thought it was worth describing multiple alternatives and the tradeoffs between them.</p>
]]></content:encoded>
			<wfw:commentRss>http://pl.atyp.us/wordpress/?feed=rss2&amp;p=3020</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>How To Lose Data</title>
		<link>http://pl.atyp.us/wordpress/?p=3005</link>
		<comments>http://pl.atyp.us/wordpress/?p=3005#comments</comments>
		<pubDate>Wed, 01 Sep 2010 12:22:59 +0000</pubDate>
		<dc:creator>Jeff Darcy</dc:creator>
				<category><![CDATA[distributed]]></category>
		<category><![CDATA[systems]]></category>

		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=3005</guid>
		<description><![CDATA[As I mentioned in my last post, I&#8217;ve been getting increasingly annoyed at a lot of the flak that has been directed toward MongoDB over data-protection issues.  I&#8217;m certainly no big fan of systems that treat memory as primary storage (with or without periodic flushes to disk) instead of a cache or buffer for [...]]]></description>
			<content:encoded><![CDATA[<p>As I mentioned in my last post, I&#8217;ve been getting increasingly annoyed at a lot of the flak that has been directed toward MongoDB over data-protection issues.  I&#8217;m certainly no big fan of systems that treat memory as primary storage (with or without periodic flushes to disk) instead of a cache or buffer for the real thing.  I&#8217;ve written enough here to back that up, but I&#8217;ve also written plenty about something that bugs me even more: FUD.  Merely raising an issue isn&#8217;t FUD, but the volume and tone and repetition of the criticism are all totally out of proportion when there are so many other data-protection issues we should also worry about.  Here are just a few ways to lose data.</p>
<ul>
<li>Don&#8217;t provide full redundancy at all levels of your system.  It&#8217;s amazing how many &#8220;distributed&#8221; systems out there aren&#8217;t really distributed at all, leaving users entirely vulnerable to loss or extended unreachability of a single node, without one peep of protest from the people who are so quick to point the finger at systems which can at least survive that most-common failure mode.</li>
<li>Be careless about non-battery-backed disk caches.  If data gets stranded in a disk cache when the power goes out, it&#8217;s <b>no different</b> than if it was stranded in memory, and yet many projects do absolutely nothing to detect let alone correct for obvious problems in this area.</li>
<li>Be careless about data ordering in the kernel.  My colleagues who work on local filesystems and pieces of the block-device subsystem in Linux (and others working on other OSes) have done a great deal of too-little-appreciated work to provide the very highest levels of data safety that they can without sacrificing any more performance than necessary.  Then folks who preach the virtues of append-only files without knowing anything at all about how they work turn around and subvert all that effort by giving mount-command and fstab-line examples that explicitly put filesystems into async mode, turn off barriers, etc.</li>
<li>A special case of the previous point is when people actually do seem to know the options that assure data protection, but forego those options for the sake of getting better benchmark numbers.  That&#8217;s simply dishonest.  You can&#8217;t claim great performance <b>and</b> great data protection if users can only really get one <b>or</b> the other depending on which options they choose.  Pick one, and shut up about the other.</li>
<li>Be careless about your own data ordering.  A single I/O operation can require several block-level updates.  Many overlapping operations can create a huge bucket of such updates, conflicting in complex ways and requiring very careful attention to the order in which the updates actually occur.  If you screw it up just once, and it takes a special brand of arrogance to believe that could never happen to you, then you corrupt data.  If you corrupt metadata, you might well lose the user data it points to.  If you corrupt user data that can be even worse than losing it, because there are security implications as well.  It&#8217;s not nice when some of your confidential data becomes part of somebody else&#8217;s file/document/whatever.  At least with mmap-based approaches, it&#8217;s fairly straightforward to do things with msync and fork and hypervisor/filesystem/LVM snapshots to at least guarantee that the state on disk remains consistent even if it&#8217;s not absolutely current.</li>
<li>Don&#8217;t provide any reasonable way to take a backup, which would protect against the nightmare scenario where data is lost not because of a hardware failure but because of a bug or user error that makes your internal redundancy irrelevant.</li>
</ul>
<p>Of course, some of these issues won&#8217;t apply to Your Favorite Data Store, e.g. if it doesn&#8217;t have a hierarchical data model or a concept of multiple users.  Then again, the list is also incomplete because the real point I&#8217;m making is that there are plenty of data-protection pitfalls and plenty of people falling into them.  Some of the loudest complainers already had to suspend their FUD campaign to deal with their own data-corruption fiasco.  Others are vulnerable to having the same thing happen &#8211; I can tell by looking at their designs or code &#8211; but those particular chickens haven&#8217;t come home to roost yet.</p>
<p>Look, I laughed at the &#8220;redundant coffee mug&#8221; joke too.  It was funny at the time, but that was a while ago.  Since then it&#8217;s been looking more and more like junior-high-school cliquishness, poking fun at a common target as a way to fit in with the herd.  It&#8217;s not helping users, it&#8217;s not advancing the state of the art, and it&#8217;s actively harming the community.  As one of the worst offenders once had the gall to tell me, be part of the solution.  Find and fix <b>new</b> data-protection issues in whichever projects have them, instead of going on and on about the one everybody already recognizes.</p>
]]></content:encoded>
			<wfw:commentRss>http://pl.atyp.us/wordpress/?feed=rss2&amp;p=3005</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Pomegranate First Thoughts</title>
		<link>http://pl.atyp.us/wordpress/?p=3007</link>
		<comments>http://pl.atyp.us/wordpress/?p=3007#comments</comments>
		<pubDate>Tue, 31 Aug 2010 13:45:04 +0000</pubDate>
		<dc:creator>Jeff Darcy</dc:creator>
				<category><![CDATA[distributed]]></category>
		<category><![CDATA[systems]]></category>

		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=3007</guid>
		<description><![CDATA[Pomegranate is a new distributed filesystem, apparently oriented toward serving many small files efficiently (thanks to @al3xandru for the link).  Here are some fairly disconnected thoughts/impressions.

The HS article says that &#8220;Pomegranate should be the first file system that is built over tabular storage&#8221; but that&#8217;s not really accurate.  For one thing, Pomegranate is [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://highscalability.com/blog/2010/8/30/pomegranate-storing-billions-and-billions-of-tiny-little-fil.html">Pomegranate</a> is a new distributed filesystem, apparently oriented toward serving many small files efficiently (thanks to <a href="http://twitter.com/al3xandru">@al3xandru</a> for the link).  Here are some fairly disconnected thoughts/impressions.</p>
<ul>
<li>The HS article says that &#8220;Pomegranate should be the first file system that is built over tabular storage&#8221; but that&#8217;s not really accurate.  For one thing, Pomegranate is only partially based on tabular storage for metadata, and relies on another distributed filesystem &#8211; Lustre is mentioned several times &#8211; for bulk data access.  I&#8217;d say <a href="http://ceph.newdream.net/">Ceph</a> is more truly based on tabular storage (RADOS) and it&#8217;s far more mature than Pomegranate.  I also feel a need to mention my own <a href="http://github.com/jdarcy/CassFS">CassFS</a> and <a href="http://github.com/jdarcy/VoldFS">VoldFS</a>, and Artur Bergman&#8217;s <a href="http://github.com/crucially/riakfuse/blob/master/riakfs-import">RiakFuse</a>, as filesystems that are completely based on tabular storage.  They&#8217;re not fully mature production-ready systems, but they are counterexamples to the original claim.</li>
<li>One way of looking at Pomegranate is that they&#8217;ve essentially replaced the metadata layer from Lustre/PVFS/Ceph/pNFS with their own while continuing to rely on the underlying DFS for data.  Perhaps this makes Pomegranate more of a meta-filesystem or filesystem sharding/caching layer than a full filesystem in and of itself, but there&#8217;s nothing wrong with that just as there&#8217;s nothing wrong with similar sharding/caching layers for databases.  Compared to Lustre, this is a significant step forward since Pomegranate&#8217;s metadata is fully distributed.  Compared to Ceph, though, it&#8217;s not so clearly innovative.  Ceph already has a distributed metadata layer, based on advanced distribution algorithms to distribute load etc.  Pomegranate&#8217;s use of ring-based consistent hashing suits my own preference a little better than Ceph&#8217;s tree-based approach (CRUSH), but there are many kinds of ring-based hashing and it looks like Pomegranate won&#8217;t really catch up to Ceph in this regard until their scheme is tweaked a few times.</li>
<li>I&#8217;m really not wild about the whole &#8220;in-memory architecture&#8221; thing.  If your update didn&#8217;t make it to disk because it was at the end of the in-memory queue and hadn&#8217;t been flushed yet, that&#8217;s no better for reliability than if you just left it in memory for ever (though it does improve capacity) and if you acknowledged the write as complete then you lied to the user.  Prompted by some of the hyper-critical and hypocritical comments I&#8217;ve seen lately bashing one project for lack of durability, I have another blog post I&#8217;m working on about how the critics&#8217; own toys can lose or corrupt data, and how claiming superior durability while using &#8220;unsafe&#8221; settings for benchmarks is dishonest, so I&#8217;ll defer most of that conversation for now.  Suffice it to say that if I were to deploy Pomegranate in production one of the first things I&#8217;d do would be to force the cache to be properly write-through instead of write-back.</li>
<li>I can see how the Pomegranate scheme efficiently supports looking up a single file among billions, even in one directory (though the actual efficacy of the approach seems unproven).  What&#8217;s less clear is how well it handles <em>listing</em> all those files, which is kind of a separate problem similar to range queries in a distributed K/V store.  This is something I spent a lot of time pondering for VoldFS, and I&#8217;m rather proud of the solution I came up with.  I think that solution might be applicable to Pomegranate as well, but need to investigate further.  Can Ma, if you read this, I&#8217;d love to brainstorm further on this.</li>
<li>Another thing I wonder about is the scalability of Pomegranate&#8217;s approach to complex operations like rename.  There&#8217;s some mention of a &#8220;reliable multisite update service&#8221; but without details it&#8217;s hard to reason further.  This is a very important issue because this is exactly where several efforts to distribute metadata in other projects &#8211; notably Lustre &#8211; have foundered.  It&#8217;s a very <em>very</em> hard problem, so if one&#8217;s goal is to create something &#8220;worthy for [the] file system community&#8221; then this would be a great area to explore further.</li>
</ul>
<p>Some of those points might seem like criticism, but they&#8217;re not intended that way &#8211; or at least they&#8217;re intended as <em>constructive</em> criticism.  They&#8217;re things I&#8217;m curious about, because I know they&#8217;re both difficult and under-appreciated by those outside the filesystem community, and they&#8217;re questions I couldn&#8217;t answer from a cursory examination of the available material.  I hope to examine and discuss these issues further, because Pomegranate really does look like an interesting and welcome addition to this space.</p>
]]></content:encoded>
			<wfw:commentRss>http://pl.atyp.us/wordpress/?feed=rss2&amp;p=3007</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>The Oracle/Google Patents</title>
		<link>http://pl.atyp.us/wordpress/?p=3000</link>
		<comments>http://pl.atyp.us/wordpress/?p=3000#comments</comments>
		<pubDate>Fri, 13 Aug 2010 15:36:11 +0000</pubDate>
		<dc:creator>Jeff Darcy</dc:creator>
				<category><![CDATA[tech]]></category>

		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=3000</guid>
		<description><![CDATA[A lot of people are commenting on the Oracle/Google suit without having looked at the patents involved.  That&#8217;s a bad idea, guaranteed to yield incorrect conclusions.  For reference, here are the ones actually mentioned in the formal complaint . . . and yes, I did enjoy looking these up on Google.

6125447: Protection domains [...]]]></description>
			<content:encoded><![CDATA[<p>A lot of people are commenting on the Oracle/Google suit without having looked at the patents involved.  That&#8217;s a bad idea, guaranteed to yield incorrect conclusions.  For reference, here are the ones actually mentioned in the <a href="http://www.scribd.com/doc/35811761/Oracle-s-complaint-against-Google-for-Java-patent-infringement">formal complaint</a> . . . and yes, I did enjoy looking these up on Google.</p>
<ul>
<li><a href="http://www.google.com/patents/about?id=dyQGAAAAEBAJ&#038;dq=6125447">6125447</a>: Protection domains to provide security in a computer system</li>
<li><a href="http://www.google.com/patents/about?id=G1YGAAAAEBAJ&#038;dq=6192476">6192476</a>: Controlling access to a resource</li>
<li><a href="http://www.google.com/patents/about?id=TzsPAAAAEBAJ&#038;dq=5966702">5966702</a>: Method and apparatus for pre-processing and packaging class files</li>
<li><a href="http://www.patentgenius.com/patent/7426720.html">7426720</a>: System and method for dynamic preloading of classes through memory space cloning of a master runtime system process</li>
<li><a href="http://www.google.com/patents/about?id=8xkPAAAAEBAJ&#038;dq=RE38,104">RE38,104</a>: Method and apparatus for resolving data references in generated code</li>
<li><a href="http://www.google.com/patents/about?id=U-4UAAAAEBAJ&#038;dq=6910205">6910205</a>: Interpreting functions utilizing a hybrid of virtual and native machine</li>
<li><a href="http://www.google.com/patents/about?id=mEwEAAAAEBAJ&#038;dq=6061520">6061520</a>: Method and system for performing static initialization</li>
</ul>
<p>First thing to remember is that this is a patent suit, not a copyright suit.  That means it&#8217;s not about &#8220;Java&#8221; at all.  It&#8217;s about certain ways of implementing a dynamic runtime, regardless of what name or input language is used.  In that context, 5966702 is probably the most specific to Oracle&#8217;s actual Java-runtime technology, and that&#8217;s all about class files.  The others are pretty general ideas, even if the Java runtime was the first embodiment used in the patent descriptions.  For purposes of determining infringement, it&#8217;s mostly the claims &#8211; not the description &#8211; that matter.  It&#8217;s probably quite premature for anybody who hasn&#8217;t looked at the Dalvik code to say whether it infringes most of these patents or not, or whether Google could avoid infringing on these claims without fundamentally changing how Dalvik works.</p>
]]></content:encoded>
			<wfw:commentRss>http://pl.atyp.us/wordpress/?feed=rss2&amp;p=3000</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NoSQL and Cloud Security</title>
		<link>http://pl.atyp.us/wordpress/?p=2988</link>
		<comments>http://pl.atyp.us/wordpress/?p=2988#comments</comments>
		<pubDate>Mon, 09 Aug 2010 14:15:19 +0000</pubDate>
		<dc:creator>Jeff Darcy</dc:creator>
				<category><![CDATA[distributed]]></category>

		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=2988</guid>
		<description><![CDATA[By now, most people interested in NoSQL and cloud storage and so on has probably seen the story of go-derper, which demonstrates two things.

Memcached has no security of its own.
Many people deploy memcached to be generally accessible.

Obviously, this is a recipe for disaster.  Less obviously, the problem is hardly limited to memcached.  Most [...]]]></description>
			<content:encoded><![CDATA[<p>By now, most people interested in NoSQL and cloud storage and so on has probably seen the story of <a href="http://www.sensepost.com/blog/4873.html">go-derper</a>, which demonstrates two things.</p>
<ol type="1">
<li>Memcached has no security of its own.</li>
<li>Many people deploy memcached to be generally accessible.</li>
</ol>
<p>Obviously, this is a recipe for disaster.  Less obviously, the problem is hardly limited to memcached.  <em>Most</em> NoSQL stores have <em>no</em> concept of security.  They&#8217;ll let anyone connect and fetch or overwrite any object.  One of the best known doesn&#8217;t even check that input is well formed, so &#8220;cat /dev/urandom | nc $host $port&#8221; from anywhere would crash it quickly.  Among all of the other differences between SQL and NoSQL systems &#8211; ACID, joins, normalization and referential integrity, scalability and partition tolerance, etc. &#8211; the near-total abandonment of security in NoSQL is rarely mentioned.  Lest it seem that I&#8217;m throwing stones from some other garden, I&#8217;d have to say many filesystems hardly fare any better.  For example, I generally like <a href="http://www.gluster.org/">GlusterFS</a> but it provides only the most basic kind of protection against information leakage or tampering.  As a POSIX filesystem it at least has a notion of authorization between users, but it does practically nothing to authenticate those users and authorization without authentication is meaningless.  The system-level authorization to connect is trivially crackable, and once I&#8217;ve done that I can easily spoof any user ID I want &#8211; including root.  I&#8217;ve had to make the point over and over again in presentations that cloud storage in general &#8211; regardless of type &#8211; is usually only suitable for deployment within a single user&#8217;s instances, protected by those instances&#8217; firewalls and sharing a common UID space.  For most such stores, if a cloud provider wants to offer it as a public, shared, permanent service separate from compute instances, a lot more work needs to be done.</p>
<p>What kind of work?  Mostly it falls into two categories: encryption and authentication/authorization (collectively &#8220;auth&#8221;).  For encryption, there&#8217;s a further distinction to be made between on-the-wire and at-rest encryption.  A lot of cloud-storage vendors make all sorts of noise about their on-the-wire encryption, but they stay quiet or vague about at-rest encryption and that&#8217;s actually more important.  The biggest threat to your data is insiders, not outsiders.  The insiders aren&#8217;t even going on the wire, so all of that AES-256 encryption there doesn&#8217;t matter a bit.  Insiders should also be assumed to have access to any keys you&#8217;ve given the provider, so the only way you can really be sure nobody&#8217;s messing with your data is if you never give them unencrypted data <em>or</em> keys for that data.  Your data must remain encrypted from the moment it leaves your control until the moment it returns again, using keys that only you possess.  I know how much of a pain that is, believe me.  I&#8217;ve had to work through the details of how to reconcile this level of security with multi-level caching and byte addressability in CloudFS, but it&#8217;s the only way to be secure.  Vendors&#8217;s descriptions of what they&#8217;re doing in this area tend to be vague, as I said, but <a href="http://www.nasuni.com/news/cloud-storage-challenge-security/">Nasuni</a> is the only one who visibly seems to be on the right track.  It sure would be nice if people could get that functionality through open source, instead of paying both a software and a storage provider to get it.  Cue appearance by Zooko to plug <a href="http://tahoe-lafs.org/trac/tahoe-lafs">Tahoe-LAFS</a> in 5, 4, 3, &#8230;</p>
<p>The other area where work needs to be done is handling user identities, which covers both auth and identity mapping.  For starters, the storage system must internally enforce permissions between users, which of course means it must have a notion of there even being multiple users.  For systems which can assume that a single connection belongs to a single user, you can then authenticate using SASL or similar and be well on your way to a full solution.  For systems that can&#8217;t make such an assumption, which includes things like filesystems, that&#8217;s not sufficient.  You need to identify and authenticate not just the system making a request, but the user as well.  I&#8217;m not a security extremist, so I can accept the argument that if you can fully authenticate a system and communicate with them through a secure channel then you can trust them to identify users correctly.  The alternative is something like GSSAPI, which requires less trust in the remote system but can be a pretty major pain to implement.</p>
<p>The last issue is identity mapping.  Even if you can ensure that a remote system is providing the correct user IDs, those IDs are still only correct in <em>their</em> context.  If you&#8217;re a cloud service provider, you really can&#8217;t assume that tenant A&#8217;s user X is the same as tenant B&#8217;s user X.  Therefore, you need to map A:X and B:X to some global users P and Q.  Because you might need to store these IDs and then return them later (e.g. on a stat() call if you&#8217;re a filesystem) you need to be able to do the reverse mapping back to A:X and B:X as well.  Lastly, because cloud tenants can and will create new users willy-nilly, you can&#8217;t require pre-registration; you need to create new mappings on the fly, whenever you see a new ID.  This ends up becoming pretty entangled with the authentication problem because authentication information needs to be looked up based on the global (not per-tenant) ID, so this can all be a big pain but &#8211; again &#8211; it&#8217;s the only way to be secure.</p>
<p>To sum up, the lesson of go-derper is not that memcached is uniquely bad.  <em>Lots</em> of systems are equally bad, and making them less bad is going to be hard, but it needs to be done before the other promises made by those systems can be realized.  For a great many people, systems that are so totally insecure are useless, no matter what other wonderful functionality they might provide.</p>
]]></content:encoded>
			<wfw:commentRss>http://pl.atyp.us/wordpress/?feed=rss2&amp;p=2988</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Spread of a Meme</title>
		<link>http://pl.atyp.us/wordpress/?p=2985</link>
		<comments>http://pl.atyp.us/wordpress/?p=2985#comments</comments>
		<pubDate>Fri, 06 Aug 2010 20:25:23 +0000</pubDate>
		<dc:creator>Jeff Darcy</dc:creator>
				<category><![CDATA[internet]]></category>

		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=2985</guid>
		<description><![CDATA[I find it fascinating how links get distributed over time.  Here&#8217;s an example involving the amazing pencil-tip sculptures by Dalton Ghetti,and the times I&#8217;ve been presented with the same link in Google Reader.

Dark Roasted Blend on July 16
BoingBoing on August 1
Damn Cool Pics on August 2
Inhabitat on August 6

I predict that it will show [...]]]></description>
			<content:encoded><![CDATA[<p>I find it fascinating how links get distributed over time.  Here&#8217;s an example involving the amazing pencil-tip sculptures by Dalton Ghetti,and the times I&#8217;ve been presented with the same link in Google Reader.</p>
<ul>
<li><a href="http://www.darkroastedblend.com/2010/07/link-latte-137.html">Dark Roasted Blend</a> on July 16</li>
<li><a href="http://www.boingboing.net/2010/08/01/more-miniature-maste.html">BoingBoing</a> on August 1</li>
<li><a href="http://damncoolpics.blogspot.com/2010/08/pencil-tip-sculptures-by-dalton-ghetti.html">Damn Cool Pics</a> on August 2</li>
<li><a href="http://inhabitat.com/2010/08/06/amazing-miniature-sculptures-carved-from-pencil-tips/">Inhabitat</a> on August 6</li>
</ul>
<p>I predict that it will show up on at least one more website I follow.  In maybe a week or so, I&#8217;ll see it in print media for the first time.  Another week or two after that, I&#8217;ll see it in the Boston Globe.  That&#8217;s the usual pattern, anyway.</p>
]]></content:encoded>
			<wfw:commentRss>http://pl.atyp.us/wordpress/?feed=rss2&amp;p=2985</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Language Specific Package Managers</title>
		<link>http://pl.atyp.us/wordpress/?p=2980</link>
		<comments>http://pl.atyp.us/wordpress/?p=2980#comments</comments>
		<pubDate>Thu, 05 Aug 2010 12:38:28 +0000</pubDate>
		<dc:creator>Jeff Darcy</dc:creator>
				<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=2980</guid>
		<description><![CDATA[As many people I&#8217;ve talked to IRL probably know, I really hate language-specific package managers.  Java has several, Python/Ruby/Erlang etc. each have their own, etc.  I totally understand the temptation.  I know it&#8217;s not all about NIH Syndrome (though some is); some of it&#8217;s about Getting Stuff Done as well.  Consider [...]]]></description>
			<content:encoded><![CDATA[<p>As many people I&#8217;ve talked to IRL probably know, I really hate language-specific package managers.  Java has several, Python/Ruby/Erlang etc. each have their own, etc.  I totally understand the temptation.  I know it&#8217;s not all about NIH Syndrome (though some is); some of it&#8217;s about Getting Stuff Done as well.  Consider the following example.  I tried to install Tornado using yum.</p>
<pre class="code">[root@fserver-1 repo]# yum install python-tornado
Loaded plugins: presto
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package python-tornado.noarch 0:1.0-2.fc15 set to be updated

(hundreds of lines of dependency stuff)

Transaction Summary
=====================================================================================
Install      13 Package(s)
Upgrade     161 Package(s)

Total download size: 156 M
Is this ok [y/N]: n</pre>
<p>Is this OK?  Are you kidding?  Of course it&#8217;s not OK, especially when I can see that the list includes things like gcc, vim, and yum itself.  I know how systems get broken, and that&#8217;s it.  By way of contrast, let&#8217;s see how it goes with easy_install.</p>
<pre class="code">[root@fserver-1 repo]# easy_install tornado
Searching for tornado
Reading http://pypi.python.org/simple/tornado/
Reading http://www.tornadoweb.org/
Best match: tornado 1.0
Downloading http://github.com/downloads/facebook/tornado/tornado-1.0.tar.gz
Processing tornado-1.0.tar.gz
Running tornado-1.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-6Wcauv/tornado-1.0/egg-dist-tmp-NEPqMm
warning: no files found matching '*.png' under directory 'demos'
zip_safe flag not set; analyzing archive contents...
tornado.autoreload: module references __file__
Adding tornado 1.0 to easy-install.pth file

Installed /usr/lib/python2.6/site-packages/tornado-1.0-py2.6.egg
Processing dependencies for tornado
Finished processing dependencies for tornado</pre>
<p>Yeah, I see the appeal.  On one hand, hours spent either rebuilding a broken system or debugging the problems that are inevitable when 161 packages get updated.  On the other hand, Getting Stuff Done in about a minute.  Yes, I tested, and the result does work fine with the packages/versions I already had.  Still, though, having to do things this way is awful.  It&#8217;s bad enough that there are still separate package managers for different Linux distros, but now programmers need to have several different package managers on one system just to install the libraries and utilities they need.  Worse still, most of these language-specific package managers <em>suck</em>.  None of them handle licensing, and few of them handle dependency resolution in any kind of sane way.  One of the most popular Java package managers doesn&#8217;t even ask before downloading half the internet with no version or authenticity checking to speak of.  Good-bye, repeatable builds.  Hello, Trojan horses.  I can see (above) the problems of having One Package Manager To Rule Them All, or of having dependency resolution be too strict, but there has to be a better way.</p>
<p>What if the system package manager could delegate to a language-specific package manager when appropriate (e.g. yum delegating to easy_install in my example)?  Then the system package manager could save itself a lot of work in such cases, and also avoid violating the Principle of Least Surprise when installing in the &#8220;standard way&#8221; for the system yields different results than installing in the &#8220;standard way&#8221; for the language.  There&#8217;d still be difficult cases when dependencies cross language barriers, but those are cases that the system package manager already has to deal with.  I know there are a lot of details to work out (especially wrt a common format for communicating what&#8217;s wanted and what actually happened), possibly there&#8217;s even some fatal flaw in this approach, but my first guess is that a federation/delegation model is likely to be better than an everyone-conflicting model.</p>
]]></content:encoded>
			<wfw:commentRss>http://pl.atyp.us/wordpress/?feed=rss2&amp;p=2980</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Server Design Revisited</title>
		<link>http://pl.atyp.us/wordpress/?p=2973</link>
		<comments>http://pl.atyp.us/wordpress/?p=2973#comments</comments>
		<pubDate>Mon, 02 Aug 2010 01:00:35 +0000</pubDate>
		<dc:creator>Jeff Darcy</dc:creator>
				<category><![CDATA[design]]></category>
		<category><![CDATA[uncategorized]]></category>

		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=2973</guid>
		<description><![CDATA[About eight years ago, I wrote a series of posts about server design, which I then combined into one post.  That was also a time when debates were raging about multi-threaded vs. event-based programming models, about the benefits and drawbacks of TCP, etc.  For a long time, my posts on those subjects constituted [...]]]></description>
			<content:encoded><![CDATA[<p>About eight years ago, I wrote a series of posts about server design, which I then <a href="/wordpress/?page_id=1277">combined into one post</a>.  That was also a time when debates were raging about multi-threaded vs. event-based programming models, about the benefits and drawbacks of TCP, etc.  For a long time, my posts on those subjects constituted my main claim to fame in the tech-blogging community, until more recent posts on startup failures and CAP theorem and language wars started reaching an even broader audience, and that  server-design article was the centerpiece of that set.  Now some of those old debates have been revived, and Matt Welsh has written a <a href="http://matt-welsh.blogspot.com/2010/07/retrospective-on-seda.html">SEDA retrospective</a>, so maybe it&#8217;s a good time for me to follow suit to see what I and the rest of the community have learned since then.</p>
<p>Before I start talking about the Four Horsemen of Poor Performance, it&#8217;s worth establishing a bit of context.  Processors have actually not gotten a lot faster in terms of raw clock speed since 2002 &#8211; <a href="http://www.intel.com/pressroom/kits/quickrefyr.htm#2002">Intel</a> was introducing a 2.8GHz Pentium 4 then &#8211; but they&#8217;ve all gone multi-core with bigger caches and faster buses and such.  Memory and disk sizes have gotten much bigger; speeds have increased less, but still significantly.  Gigabit Ethernet was at the same stage back then that 10GbE is at today.  Java has gone from being the cool new kid on the block to being the grumpy old man the new cool kids make fun of, with nary a moment spent in between.  Virtualization and cloud have become commonplace.  Technologies like map/reduce and NoSQL have offered new solutions to data problems, and created new needs as well.  All of the tradeoffs have changed, and of course we&#8217;ve learned a bit as well.  Has any of that changed how the Four Horsemen ride?<br />
<span id="more-2973"></span><br />
With regard to data copies, I think we&#8217;ve lost ground.  More people now realize that data copies are bad, but with processors and memory being so much faster they seem less inclined to do anything about it.  Many &#8220;modern&#8221; languages have absolutely atrocious support for the kind of efficient buffer-list methods I recommended.  Immutable-data languages inevitably force programmers to copy data into new variables where once they would have updated in place.  Sometimes they even force the new variable to be local in a new function called only for that purpose.  You can argue all you like about the concurrency or robustness advantages of the immutable-data approach, you can argue that good programmers won&#8217;t &#8220;subvert&#8221; your favorite language that way, but the fact is that real-world programmers do engage in such subversion and it does carry a performance cost.  Even if much of that cost is ameliorated by using reference counts instead of true copies, it&#8217;s still less efficient than modifiable buffer chains.</p>
<p>The context-switch issue is the one being debated most nowadays, but surprisingly little has actually changed.  Pretty much all of what I said earlier &#8211; about the ratio of threads to processors, about single-threaded approaches being beneath contempt, about coroutines giving all of the headaches of concurrency with none of the advantages &#8211; remains true.  I guess that&#8217;s not all that much of a surprise since I&#8217;d been working on multiprocessors for years when I wrote the article even though they weren&#8217;t fully mainstream yet.  Sure, the scalability of native threads has improved a lot, most notably in Linux, but context switches still aren&#8217;t free.  In a multiprocessor system, you also have the problem of resuming a thread on a different processor with a stone-cold cache.  Some people who just heard of <a href="http://research.microsoft.com/apps/pubs/default.aspx?id=69844">cohort scheduling</a> think it provides a silver-bullet solution, but it really doesn&#8217;t.  If you want to worry about cache warmth, you have to think about three kinds of cached data.</p>
<ul>
<li>Instructions (yes, even with a unified I+D cache)</li>
<li>The actual data being operated on.</li>
<li>The secondary data representing request state, global/persistent data structures, etc.</li>
</ul>
<p>Cohort scheduling mostly addresses the first of these, and somewhat the third.  Inevitably, by preserving locality in these cases it tends to make things a bit worse for the second &#8211; and largest &#8211; category of cache contents.  The paper was written in the same era as my own post.  It made a certain amount of sense in the context of the type/size/speed of caches in use at the time, and the loads being placed on them.  Does it make an equal amount of sense in today&#8217;s context?  Maybe.  Sometimes.  There&#8217;s no silver-bullet solution here.  If you really want to optimize for cache warmth, you&#8217;ll still have to think hard about what data is being cached, how it&#8217;s moving between cache levels, and what program structures will create access patterns that minimize that movement.</p>
<p>My conclusion, just as before: by all means use multiple native threads, and use multiple coroutines on top of those if it suits you, but use both judiciously and pragmatically.  Above all, try to program in a way that maximizes your flexibility to adjust the balance between event-based and multi-threaded and coroutine-based approaches in your program, as the tradeoffs and the program itself continue to change.</p>
<p>On the last two issues &#8211; memory allocation and lock contention &#8211; there has also been little change.  Memory allocation is becoming a bit of a hot issue as more people realize that their oh-so-convenient languages and frameworks are in some cases creating/deleting <a href="http://merbist.com/2010/07/29/object-allocation-why-you-should-care/">thousands of objects</a> per request.  That&#8217;s just obscene.  The best response is to avoid the overhead altogether, going as far up the stack as you have to, but for those unwilling to do so there&#8217;s still ample opportunity to make those wasteful operations less costly.  Similarly, there have been some advances in avoiding lock contention &#8211; most notably wider knowledge and acceptance of the actor model &#8211; but it remains a very thorny problem that I&#8217;d need a whole separate post to address.</p>
<p>Perhaps the biggest change I&#8217;d make to the original post, if I were writing it today, is to the grab-bag of items at the end.  For example, many people are coming up with code to exploit the vast differences in latency and throughput between sequential and random disk I/O.  Log-structured filesystems are relevant again, even if many insist on embedding ad-hoc, informally specified, bug-ridden, slow implementations (with apologies to <a href="http://en.wikipedia.org/wiki/Greenspun's_Tenth_Rule">Greenspun</a>) in other systems before they&#8217;ve even read up on the original.  There&#8217;s a fifth horseman there, which is the failure to account for the motion of data between levels of the storage hierarchy, or deal properly with the fundamental differences between those levels.  Think about the following differences.</p>
<ul>
<li>Cache: extremely fast, seemingly byte-addressable but really more alignment-sensitive than people think, very limited size.</li>
<li>Memory: slower than cache but faster than storage by an even greater degree, byte addressable</li>
<li>Direct-connected SSD (e.g. Fusion): very alignment sensitive, low/flat latency but some artifacts due to wear leveling and garbage collection</li>
<li>External SSD: a tiny bit slower than direct-connected, but otherwise similar</li>
<li>Single disk: slower, more variable latency due to seeks (few people can deal with seek issues well enough to notice rotational latency any more), internal caches which affect both durability and performance ratios</li>
<li>Disk array: faster than single disk but different ratios for read vs. write and sequential vs. random, cache is usually battery-backed and large enough that writes very rarely have to wait (but reads often do)</li>
</ul>
<p>That&#8217;s a lot of differences, which can be hard to account for.  Failing to understand these different characteristics and the consequences of translating between them is what&#8217;s behind misunderstandings about caches and cohort scheduling, behind those loathsome memory-only &#8220;data stores&#8221; and even behind a lot of SSD mania.  In many cases this data motion is the primary determinant of overall program performance.  It needs to be carefully managed to ensure that data will be where you want it, when you want it, with a minimum of time spent playing &#8220;musical chairs&#8221; with it.</p>
]]></content:encoded>
			<wfw:commentRss>http://pl.atyp.us/wordpress/?feed=rss2&amp;p=2973</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Bad Presentations</title>
		<link>http://pl.atyp.us/wordpress/?p=2967</link>
		<comments>http://pl.atyp.us/wordpress/?p=2967#comments</comments>
		<pubDate>Sat, 24 Jul 2010 01:21:25 +0000</pubDate>
		<dc:creator>Jeff Darcy</dc:creator>
				<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=2967</guid>
		<description><![CDATA[Presentations are the bane of the modern engineer&#8217;s existence.  If you&#8217;re watching a presentation then it means you&#8217;re in a meeting, which is already something most of us don&#8217;t enjoy, and even worse it means you&#8217;re in a kind of meeting (or part of a meeting) that&#8217;s only minimally interactive.  If you&#8217;re giving [...]]]></description>
			<content:encoded><![CDATA[<p>Presentations are the bane of the modern engineer&#8217;s existence.  If you&#8217;re watching a presentation then it means you&#8217;re in a meeting, which is already something most of us don&#8217;t enjoy, and even worse it means you&#8217;re in a kind of meeting (or part of a meeting) that&#8217;s only minimally interactive.  If you&#8217;re giving a presentation, that means even more time away from the technical tasks that drew you to this profession.  Nonetheless, any project leader/advocate nowadays and for the last several years has had to spend a lot of time and energy on what is essentially a marketing activity, which is why I dubbed it &#8220;markelopment&#8221; (a deliberate riff on &#8220;devops&#8221;) on Twitter.  I&#8217;m not among those who think presentations are always evil and should be shunned, but after having created/delivered quite a few presentations and sat through a great many more, I think I&#8217;m in a position to offer just a little bit advice.</p>
<p>First, I&#8217;ll say that hundred-slide decks annoy me.  Yes, I know it&#8217;s usually a reaction to the problem of slides that are too few and too densely packed, leading to the also-awful phenomenon of the presenter spending most of the time just repeating what everyone can already read, but it&#8217;s an <b>over</b>-reaction.  The other day I was reading some slides online, and I encountered the following pattern:</p>
<blockquote><p>Slide N-1: (clip art)<br />
Slide N: &#8220;vs.&#8221;<br />
Slide N+1: (more clip art)</p></blockquote>
<p>A whole slide just for &#8220;vs.&#8221;?  That&#8217;s wasting my time.  Presenters who use that style end up spending too much of their presentation actually changing slides and waiting the obligatory five seconds for the audience to catch up, no matter how little content is on each.  <a href="http://blog.fosketts.net/">Stephen Foskett</a> pointed out that Lawrence Lessig only puts one word on each slide and is still a very highly regarded speaker.  Well, yeah, he&#8217;s Lawrence Lessig.  I&#8217;m not, you&#8217;re not, and probably neither is anyone you know (unless of course you know Lessig).</p>
<p>Now, I know presentation length can be tricky.  I myself do tend to err on the side of making my slides too busy and very spare graphically.  I do that because I know that the slides are likely to be viewed more in email etc. than with me actually presenting them, so to make sure they&#8217;re useful as a reference I often sacrifice a little on the &#8220;live&#8221; side.  What I&#8217;d generally like to do is create two decks &#8211; one verbally spare and graphically rich to illustrate or anchor what I&#8217;m saying live, and a longer form for sending around later.  That means even more time spent in Impress, though, and is often not feasible for various other reasons as well.  My best advice is to determine a good &#8220;minutes per slide&#8221; figure based on the content, the audience, and an honest appraisal of your own ability to keep the audience interested while the slides aren&#8217;t changing, then use that to determine an appropriate slide count.  If you&#8217;re a <em>very</em> dynamic speaker, you can go the Lessig route and spend five minutes on a one-word slide.  If you need a hundred slides to fill a thirty-minute presentation, then maybe you&#8217;re admitting something about your speaking skills or the intrinsic value of what you&#8217;re presenting.</p>
<p>Second lesson: don&#8217;t get too cute.  I&#8217;ve seen too many presentations lately, especially in the &#8220;edgier&#8221; tech areas, where the author had <em>obviously</em> spent way more time on finding funny clip art and quotes than on the actual content.  Again, it&#8217;s a balancing act.  Humor is good.  A good quote or graphic can be an absolutely fantastic anchor for an important point, which you then elaborate or build on verbally.  One not-really-funny slide after another after another with too little in between is just distracting.</p>
<p>Another error that I find even less excusable is simple ugliness.  Yesterday I saw a presentation which had been done entirely &#8211; from title to closing &#8211; in what looked like a version of <a href="http://www.mentalfloss.com/blogs/archives/61259">Comic Sans</a> done to look like paint-brush strokes (house painting, not portrait painting).  It wasn&#8217;t very readable, and looked <em>totally</em> amateurish.  I was embarrassed for the author.</p>
<p>Now, somebody&#8217;s probably going to think I&#8217;m saying that I&#8217;ll totally dismiss an otherwise good presentation of an important idea because of slide count or graphics or font choice.  Not so.  I&#8217;ll still listen, but it will cost the author a &#8220;point&#8221; in my mind.  It&#8217;s worth keeping in mind that, in these situations, every single point can matter.  If you&#8217;re presenting to hundreds of people and only care if one or two respond in any significant way to what you&#8217;re saying, then maybe none of this matters.  Far more presentations are given in smaller groups, though, where the opinion of everyone at the table does matter.  People being how they are, they will use all sorts of nuances to form an impression of whether you&#8217;re smart, whether you&#8217;re trustworthy, etc.  It probably won&#8217;t be one big thing that causes you not to get that next meeting, but an accumulation of little things.  (If you think &#8220;meritocratic&#8221; open-source techies are any different, BTW, you&#8217;re just kidding yourself.  The standards are different, but they&#8217;re just as stringently applied.  Set the wrong tone and you&#8217;ll be written off just as surely and completely.)  Why give someone the <em>chance</em> to think that you&#8217;re too serious or too frivolous, that your presentation shows disorganization, poor prioritization or disrespect for others&#8217; time or sensibilities?  Focus on content, by all means, but take just a <em>little</em> time to make sure it&#8217;s being delivered in a way that will ensure a good reception.</p>
]]></content:encoded>
			<wfw:commentRss>http://pl.atyp.us/wordpress/?feed=rss2&amp;p=2967</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
