<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Comparing Key/Value Stores, round 2</title>
	<atom:link href="http://pl.atyp.us/wordpress/?feed=rss2&#038;p=2435" rel="self" type="application/rss+xml" />
	<link>http://pl.atyp.us/wordpress/?p=2435</link>
	<description>Making the world better, one byte at a time.</description>
	<lastBuildDate>Sun, 05 Sep 2010 02:56:27 -0400</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Ismael Juma</title>
		<link>http://pl.atyp.us/wordpress/?p=2435&#038;cpage=1#comment-142124</link>
		<dc:creator>Ismael Juma</dc:creator>
		<pubDate>Sun, 01 Nov 2009 16:39:48 +0000</pubDate>
		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=2435#comment-142124</guid>
		<description>Hi Jeff,

A note regarding the Voldemort tests. The Python client was created as a proof of concept and I am not sure if it has had any production usage. The Java client has had the most testing (it&#039;s what LinkedIn uses) and the performance numbers from the C++ client seem to be good. Some numbers here:

http://groups.google.com/group/project-voldemort/browse_thread/thread/9dd7ee33305da887/371d23a2afe244c4

Best,
Ismael</description>
		<content:encoded><![CDATA[<p>Hi Jeff,</p>
<p>A note regarding the Voldemort tests. The Python client was created as a proof of concept and I am not sure if it has had any production usage. The Java client has had the most testing (it&#8217;s what LinkedIn uses) and the performance numbers from the C++ client seem to be good. Some numbers here:</p>
<p><a href="http://groups.google.com/group/project-voldemort/browse_thread/thread/9dd7ee33305da887/371d23a2afe244c4" rel="nofollow">http://groups.google.com/group/project-voldemort/browse_thread/thread/9dd7ee33305da887/371d23a2afe244c4</a></p>
<p>Best,<br />
Ismael</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff Darcy</title>
		<link>http://pl.atyp.us/wordpress/?p=2435&#038;cpage=1#comment-141904</link>
		<dc:creator>Jeff Darcy</dc:creator>
		<pubDate>Fri, 30 Oct 2009 20:41:58 +0000</pubDate>
		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=2435#comment-141904</guid>
		<description>BTW, another point about Cassandra taking 100M off the bat.  Those are &lt;em&gt;resident&lt;/em&gt; pages.  I understand the bit about the JVM allocating a large heap, but there doesn&#039;t seem to be any particularly good reason to go &lt;em&gt;touching&lt;/em&gt; so much of it so that the pages become resident.  If it just touches them once and they can easily be reclaimed when needed then I guess there&#039;s little harm done, though.  Checking whether that&#039;s actually the case is now on my list of things to do when I get back to this.</description>
		<content:encoded><![CDATA[<p>BTW, another point about Cassandra taking 100M off the bat.  Those are <em>resident</em> pages.  I understand the bit about the JVM allocating a large heap, but there doesn&#8217;t seem to be any particularly good reason to go <em>touching</em> so much of it so that the pages become resident.  If it just touches them once and they can easily be reclaimed when needed then I guess there&#8217;s little harm done, though.  Checking whether that&#8217;s actually the case is now on my list of things to do when I get back to this.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff Darcy</title>
		<link>http://pl.atyp.us/wordpress/?p=2435&#038;cpage=1#comment-141876</link>
		<dc:creator>Jeff Darcy</dc:creator>
		<pubDate>Fri, 30 Oct 2009 15:48:37 +0000</pubDate>
		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=2435#comment-141876</guid>
		<description>Good points, xix.  I hadn&#039;t thought of baselining MySQL, but it&#039;s a good enough idea that I think I&#039;ll make it a first priority when I get back to this.  Thanks!

As for Cassandra, I think those sites are using Cassandra because it&#039;s a great piece of software.  I&#039;ll be the first to admit that these benchmark results are just the tip of the iceberg.  In particular, they are specific to virtualized environments and they only test &quot;out of the gate performance&quot; with small datasets.  A place like Facebook or Digg is likely to run a bunch of servers natively, which represents a very different performance milieu, and they&#039;re much more concerned with huge datasets.  A system like Cassandra that&#039;s well proven at that scale, which also presents a data model and features that they can take advantage of, has huge value.  I see the Digg story as a crucial point of validation not just for Cassandra - though I offer my hearty congratulations to that team - but for the entire space of alternative data stores.  We&#039;re still in the space where a win for one is a win for all.  That will change, but for now let&#039;s enjoy it.</description>
		<content:encoded><![CDATA[<p>Good points, xix.  I hadn&#8217;t thought of baselining MySQL, but it&#8217;s a good enough idea that I think I&#8217;ll make it a first priority when I get back to this.  Thanks!</p>
<p>As for Cassandra, I think those sites are using Cassandra because it&#8217;s a great piece of software.  I&#8217;ll be the first to admit that these benchmark results are just the tip of the iceberg.  In particular, they are specific to virtualized environments and they only test &#8220;out of the gate performance&#8221; with small datasets.  A place like Facebook or Digg is likely to run a bunch of servers natively, which represents a very different performance milieu, and they&#8217;re much more concerned with huge datasets.  A system like Cassandra that&#8217;s well proven at that scale, which also presents a data model and features that they can take advantage of, has huge value.  I see the Digg story as a crucial point of validation not just for Cassandra &#8211; though I offer my hearty congratulations to that team &#8211; but for the entire space of alternative data stores.  We&#8217;re still in the space where a win for one is a win for all.  That will change, but for now let&#8217;s enjoy it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: The xix</title>
		<link>http://pl.atyp.us/wordpress/?p=2435&#038;cpage=1#comment-141861</link>
		<dc:creator>The xix</dc:creator>
		<pubDate>Fri, 30 Oct 2009 14:27:26 +0000</pubDate>
		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=2435#comment-141861</guid>
		<description>These are apparently very well thought tests and benchmarks, but i can&#039;t explain why many sites (facebook,digg..) started using Cassandra by looking at these numbers. And there is a missing benchmark too : MySQL!</description>
		<content:encoded><![CDATA[<p>These are apparently very well thought tests and benchmarks, but i can&#8217;t explain why many sites (facebook,digg..) started using Cassandra by looking at these numbers. And there is a missing benchmark too : MySQL!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Ellis</title>
		<link>http://pl.atyp.us/wordpress/?p=2435&#038;cpage=1#comment-141787</link>
		<dc:creator>Jonathan Ellis</dc:creator>
		<pubDate>Fri, 30 Oct 2009 01:35:15 +0000</pubDate>
		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=2435#comment-141787</guid>
		<description>Thanks -- and fwiw it&#039;s clear from where I sit that you&#039;re making a good-faith effort to be fair, so I&#039;m not trying to be prickly either, thought it&#039;s always hard to tell in this medium. :)  Good luck!</description>
		<content:encoded><![CDATA[<p>Thanks &#8212; and fwiw it&#8217;s clear from where I sit that you&#8217;re making a good-faith effort to be fair, so I&#8217;m not trying to be prickly either, thought it&#8217;s always hard to tell in this medium. <img src='http://pl.atyp.us/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   Good luck!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff Darcy</title>
		<link>http://pl.atyp.us/wordpress/?p=2435&#038;cpage=1#comment-141762</link>
		<dc:creator>Jeff Darcy</dc:creator>
		<pubDate>Thu, 29 Oct 2009 20:24:27 +0000</pubDate>
		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=2435#comment-141762</guid>
		<description>Thanks for the tips, Jonathan.  I&#039;m pretty sure I did fix the log levels (have to remember where I stashed all the configs to check) but I didn&#039;t tweak much else.  If/when I get back to this, I&#039;ll look into some of the other tweaks you mention.
&lt;blockquote&gt;as with any JVM-based app, you need to do about 10k of each op you are benchmarking (per node) to let it JIT things before you start measuring.&lt;/blockquote&gt;
I&#039;m not sure if this is unduly affecting the results right now.  In the course of running all those read and write tests I would have done far more than 10K requests, and I didn&#039;t see any significant change with a second set of runs following the first.
&lt;blockquote&gt;we found that the GIL has a significant impact on doing threaded testing from Python. prefer multiprocessing.&lt;/blockquote&gt;
I didn&#039;t use Python threading for pretty much this reason.  The tests were ten separate processes.
&lt;blockquote&gt;Cassandra isn’t designed to handle large blob values; it gives you a column model and expects you to take advantage of that &lt;/blockquote&gt;
10K isn&#039;t really that large.  It&#039;s at the upper end of what should go into a database row or C/C++ structure, but I&#039;ve seen larger items plenty of times.  All it takes is a few long strings/arrays/lists.  If there&#039;s a data-model issue here, it&#039;s probably that I&#039;m only using one column.  Column-oriented stores like Cassandra can make it possible for an application programmer to use fewer operations to get/set related values, or reduce the amount of data that needs to be transferred for a client-side-serialization approach.  Then again, there&#039;s a whole range of other features such as enumerations and extra operations besides get/set/delete that distinguish some stores from others.  Keeping results directly comparable often means testing the lowest common denominator, unfortunately, but application developers should keep in mind that there&#039;s more to the story.
&lt;blockquote&gt;10 concurrent clients isn’t a whole lot, for any of these systems. I doubt you can max out the CPU on a single Cassandra node with 10 client threads, even a relatively wimpy EC2 VM (once you fix GC/logging/etc options). (And yes, CPU is almost always the first bottleneck you hit.)&lt;/blockquote&gt;
I won&#039;t argue that it &lt;em&gt;is&lt;/em&gt;, but I&#039;m far from convinced that it &lt;em&gt;should be&lt;/em&gt;.  Especially in a virtualized environment, where computation is almost as fast as native but network and disk I/O are markedly slower (even with paravirtualized drivers), I don&#039;t think I&#039;d necessarily expect CPU to be the first bottleneck.  How much computation really needs to be done per packet or per block, and if CPU is really the limiting factor then how did Keyspace get higher transaction rates with less CPU?  Nonetheless, if/when I get back to this I&#039;ll try running with more threads to see what kind of difference it makes.  It might make things better, but I wouldn&#039;t be much more surprised if it made things worse.
&lt;blockquote&gt;10 clients is especially not enough when hitting a cluster of 3 — you’re not seeing throughput go up because you’re going from 100% of ops being local to 1/3, so intra-node latency is killing you.&lt;/blockquote&gt;
The client was completely separate, so it was 100% remote either way.

FWIW, I don&#039;t mean to be disparaging toward Cassandra at all.  If I were doing this &quot;for real&quot; as the basis for something I was actually putting into production, I&#039;d probably do my detailed comparisons with Cassandra and Voldemort (taking advantage of Cassandra&#039;s richer data model as much as possible).  Why not Keyspace?  Even though it did well in these tests, I remain leery of a single-floating-master architecture at serious scale, and there seems to be more evidence that Cassandra in particular looks better as the dataset gets larger.  I think you guys have done good work, and I don&#039;t want that to go unsaid.</description>
		<content:encoded><![CDATA[<p>Thanks for the tips, Jonathan.  I&#8217;m pretty sure I did fix the log levels (have to remember where I stashed all the configs to check) but I didn&#8217;t tweak much else.  If/when I get back to this, I&#8217;ll look into some of the other tweaks you mention.</p>
<blockquote><p>as with any JVM-based app, you need to do about 10k of each op you are benchmarking (per node) to let it JIT things before you start measuring.</p></blockquote>
<p>I&#8217;m not sure if this is unduly affecting the results right now.  In the course of running all those read and write tests I would have done far more than 10K requests, and I didn&#8217;t see any significant change with a second set of runs following the first.</p>
<blockquote><p>we found that the GIL has a significant impact on doing threaded testing from Python. prefer multiprocessing.</p></blockquote>
<p>I didn&#8217;t use Python threading for pretty much this reason.  The tests were ten separate processes.</p>
<blockquote><p>Cassandra isn’t designed to handle large blob values; it gives you a column model and expects you to take advantage of that </p></blockquote>
<p>10K isn&#8217;t really that large.  It&#8217;s at the upper end of what should go into a database row or C/C++ structure, but I&#8217;ve seen larger items plenty of times.  All it takes is a few long strings/arrays/lists.  If there&#8217;s a data-model issue here, it&#8217;s probably that I&#8217;m only using one column.  Column-oriented stores like Cassandra can make it possible for an application programmer to use fewer operations to get/set related values, or reduce the amount of data that needs to be transferred for a client-side-serialization approach.  Then again, there&#8217;s a whole range of other features such as enumerations and extra operations besides get/set/delete that distinguish some stores from others.  Keeping results directly comparable often means testing the lowest common denominator, unfortunately, but application developers should keep in mind that there&#8217;s more to the story.</p>
<blockquote><p>10 concurrent clients isn’t a whole lot, for any of these systems. I doubt you can max out the CPU on a single Cassandra node with 10 client threads, even a relatively wimpy EC2 VM (once you fix GC/logging/etc options). (And yes, CPU is almost always the first bottleneck you hit.)</p></blockquote>
<p>I won&#8217;t argue that it <em>is</em>, but I&#8217;m far from convinced that it <em>should be</em>.  Especially in a virtualized environment, where computation is almost as fast as native but network and disk I/O are markedly slower (even with paravirtualized drivers), I don&#8217;t think I&#8217;d necessarily expect CPU to be the first bottleneck.  How much computation really needs to be done per packet or per block, and if CPU is really the limiting factor then how did Keyspace get higher transaction rates with less CPU?  Nonetheless, if/when I get back to this I&#8217;ll try running with more threads to see what kind of difference it makes.  It might make things better, but I wouldn&#8217;t be much more surprised if it made things worse.</p>
<blockquote><p>10 clients is especially not enough when hitting a cluster of 3 — you’re not seeing throughput go up because you’re going from 100% of ops being local to 1/3, so intra-node latency is killing you.</p></blockquote>
<p>The client was completely separate, so it was 100% remote either way.</p>
<p>FWIW, I don&#8217;t mean to be disparaging toward Cassandra at all.  If I were doing this &#8220;for real&#8221; as the basis for something I was actually putting into production, I&#8217;d probably do my detailed comparisons with Cassandra and Voldemort (taking advantage of Cassandra&#8217;s richer data model as much as possible).  Why not Keyspace?  Even though it did well in these tests, I remain leery of a single-floating-master architecture at serious scale, and there seems to be more evidence that Cassandra in particular looks better as the dataset gets larger.  I think you guys have done good work, and I don&#8217;t want that to go unsaid.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Ellis</title>
		<link>http://pl.atyp.us/wordpress/?p=2435&#038;cpage=1#comment-141744</link>
		<dc:creator>Jonathan Ellis</dc:creator>
		<pubDate>Thu, 29 Oct 2009 18:35:57 +0000</pubDate>
		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=2435#comment-141744</guid>
		<description>(I&#039;m a Cassandra committer.)

A few notes on Cassandra:
 - as with any JVM-based app, the jvm will grab as much memory as you let it.  That doesn&#039;t mean it actually needs that much...  Cassandra defaults to a 1GB heap but you can certainly get away with less, especially on simple benchmarks.
 - Also as with any JVM-based app, you need to do about 10k of each op you are benchmarking (per node) to let it JIT things before you start measuring.
 - You&#039;ll easily double performance by setting the log level from DEBUG to INFO (unclear if you actually did this, so mentioning it for completeness)
 - The CPU load you&#039;re seeing is from bad default GC options.  the defaults will be fixed for 0.4.2 and 0.5, but it&#039;s easy to tweak for 0.4.1: http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/200910.mbox/%3Ce06563880910222021k6e84262cl912bf80772c1dbb@mail.gmail.com%3E 
 - we found that the GIL has a significant impact on doing threaded testing from Python.  prefer multiprocessing.
 - Cassandra isn&#039;t designed to handle large blob values; it gives you a column model and expects you to take advantage of that :)

Finally, some general notes:
 - 10 concurrent clients isn&#039;t a whole lot, for any of these systems.  I doubt you can max out the CPU on a single Cassandra node with 10 client threads, even a relatively wimpy EC2 VM (once you fix GC/logging/etc options).  (And yes, CPU is almost always the first bottleneck you hit.)  
 - 10 clients is especially not enough when hitting a cluster of 3 -- you&#039;re not seeing throughput go up because you&#039;re going from 100% of ops being local to 1/3, so intra-node latency is killing you.</description>
		<content:encoded><![CDATA[<p>(I&#8217;m a Cassandra committer.)</p>
<p>A few notes on Cassandra:<br />
 &#8211; as with any JVM-based app, the jvm will grab as much memory as you let it.  That doesn&#8217;t mean it actually needs that much&#8230;  Cassandra defaults to a 1GB heap but you can certainly get away with less, especially on simple benchmarks.<br />
 &#8211; Also as with any JVM-based app, you need to do about 10k of each op you are benchmarking (per node) to let it JIT things before you start measuring.<br />
 &#8211; You&#8217;ll easily double performance by setting the log level from DEBUG to INFO (unclear if you actually did this, so mentioning it for completeness)<br />
 &#8211; The CPU load you&#8217;re seeing is from bad default GC options.  the defaults will be fixed for 0.4.2 and 0.5, but it&#8217;s easy to tweak for 0.4.1: <a href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/200910.mbox/%3Ce06563880910222021k6e84262cl912bf80772c1dbb@mail.gmail.com%3E" rel="nofollow">http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/200910.mbox/%3Ce06563880910222021k6e84262cl912bf80772c1dbb@mail.gmail.com%3E</a><br />
 &#8211; we found that the GIL has a significant impact on doing threaded testing from Python.  prefer multiprocessing.<br />
 &#8211; Cassandra isn&#8217;t designed to handle large blob values; it gives you a column model and expects you to take advantage of that <img src='http://pl.atyp.us/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Finally, some general notes:<br />
 &#8211; 10 concurrent clients isn&#8217;t a whole lot, for any of these systems.  I doubt you can max out the CPU on a single Cassandra node with 10 client threads, even a relatively wimpy EC2 VM (once you fix GC/logging/etc options).  (And yes, CPU is almost always the first bottleneck you hit.)<br />
 &#8211; 10 clients is especially not enough when hitting a cluster of 3 &#8212; you&#8217;re not seeing throughput go up because you&#8217;re going from 100% of ops being local to 1/3, so intra-node latency is killing you.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
