<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Parade of FAIL</title>
	<atom:link href="http://pl.atyp.us/wordpress/?feed=rss2&#038;p=2458" rel="self" type="application/rss+xml" />
	<link>http://pl.atyp.us/wordpress/?p=2458</link>
	<description>Making the world better, one byte at a time.</description>
	<lastBuildDate>Sun, 05 Sep 2010 02:56:27 -0400</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: lacos</title>
		<link>http://pl.atyp.us/wordpress/?p=2458&#038;cpage=1#comment-144208</link>
		<dc:creator>lacos</dc:creator>
		<pubDate>Mon, 23 Nov 2009 10:50:00 +0000</pubDate>
		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=2458#comment-144208</guid>
		<description>&quot;Shared memory is great up to its scaling limits, but its scaling limits in the real world are determined by programmer behavior. That behavior is typically to abuse the sharing so that the amount of coherency traffic rises as (approximately) the square of the processor count, rapidly overwhelming even the interconnect resources that are available on-chip. So far the only viable solution has been to constrain use of the interconnect by adopting models like MPI.&quot;

Not to contradict or anything -- I&#039;m not even sure if this could be better used as a PRO or CON argument for your point -- I&#039;ll just add this as something possibly (hopefully!) relevant:

http://lacos.hu/lbzip2-scaling/scaling.html</description>
		<content:encoded><![CDATA[<p>&#8220;Shared memory is great up to its scaling limits, but its scaling limits in the real world are determined by programmer behavior. That behavior is typically to abuse the sharing so that the amount of coherency traffic rises as (approximately) the square of the processor count, rapidly overwhelming even the interconnect resources that are available on-chip. So far the only viable solution has been to constrain use of the interconnect by adopting models like MPI.&#8221;</p>
<p>Not to contradict or anything &#8212; I&#8217;m not even sure if this could be better used as a PRO or CON argument for your point &#8212; I&#8217;ll just add this as something possibly (hopefully!) relevant:</p>
<p><a href="http://lacos.hu/lbzip2-scaling/scaling.html" rel="nofollow">http://lacos.hu/lbzip2-scaling/scaling.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: lacos</title>
		<link>http://pl.atyp.us/wordpress/?p=2458&#038;cpage=1#comment-142547</link>
		<dc:creator>lacos</dc:creator>
		<pubDate>Thu, 05 Nov 2009 23:53:58 +0000</pubDate>
		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=2458#comment-142547</guid>
		<description>Great answer, thanks! The part on the self-fulfilling prophecy was particularly creepy.</description>
		<content:encoded><![CDATA[<p>Great answer, thanks! The part on the self-fulfilling prophecy was particularly creepy.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff Darcy</title>
		<link>http://pl.atyp.us/wordpress/?p=2458&#038;cpage=1#comment-142489</link>
		<dc:creator>Jeff Darcy</dc:creator>
		<pubDate>Thu, 05 Nov 2009 14:32:54 +0000</pubDate>
		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=2458#comment-142489</guid>
		<description>Good points about RISC/CISC and CPU/GPU going back and forth.

&lt;blockquote&gt;why didn’t the market reward SiCortex then?&lt;/blockquote&gt;
The market seemed quite willing to; it&#039;s the investors who failed.  No matter how good a startup&#039;s technology is, they face an uphill battle in the marketplace.  Time after time, we&#039;d hear prospective customers say they loved the product, they loved the people they&#039;d worked with through the evaluation period, but they were afraid to spend that much for a system from a small company that might not survive.  Their prophecy of failure turned out to be self-fulfilling, just as a prophecy of success would have been.  If those same customers had evaluated products purely on the merits, many of them would have bought our systems and we would have been fine.  Building up that trust requires demonstrating that the original technical success is repeatable, but building that second generation - which we were doing - takes more time and money that early-adopter customers put in - hence the need for later-stage investment.  More &lt;a href=&quot;/wordpress/?p=2121&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;, and it was a similar story at Revivio and many other places.  The lesson from SiCortex is not a technical one, but that startups should worry about their investors&#039; viability as much as the other way around.

Getting back to the topic, it&#039;s important to note that all of the very biggest systems in the world - e.g. Roadrunner, Jaguar, Intrepid - are based on essentially the same kind of non-shared-memory structure as SiCortex.  The Blue Gene and PowerXCell8i systems from IBM are particularly interesting in this context because they use explicit memory hierarchies even within nodes.  Shared memory is great up to its scaling limits, but its scaling limits in the real world are determined by programmer behavior.  That behavior is typically to abuse the sharing so that the amount of coherency traffic rises as (approximately) the square of the processor count, rapidly overwhelming even the interconnect resources that are available on-chip.  So far the only viable solution has been to constrain use of the interconnect by adopting models like MPI.  Hadoop manages to do something like that across larger systems that share storage rather than memory, where similar problems also occur.  Perhaps something like Cilk or a distributed Grand Central Dispatch can do that for larger systems that share memory.  It&#039;s still an active area of research, and I wish those researchers well.</description>
		<content:encoded><![CDATA[<p>Good points about RISC/CISC and CPU/GPU going back and forth.</p>
<blockquote><p>why didn’t the market reward SiCortex then?</p></blockquote>
<p>The market seemed quite willing to; it&#8217;s the investors who failed.  No matter how good a startup&#8217;s technology is, they face an uphill battle in the marketplace.  Time after time, we&#8217;d hear prospective customers say they loved the product, they loved the people they&#8217;d worked with through the evaluation period, but they were afraid to spend that much for a system from a small company that might not survive.  Their prophecy of failure turned out to be self-fulfilling, just as a prophecy of success would have been.  If those same customers had evaluated products purely on the merits, many of them would have bought our systems and we would have been fine.  Building up that trust requires demonstrating that the original technical success is repeatable, but building that second generation &#8211; which we were doing &#8211; takes more time and money that early-adopter customers put in &#8211; hence the need for later-stage investment.  More <a href="/wordpress/?p=2121" rel="nofollow">here</a>, and it was a similar story at Revivio and many other places.  The lesson from SiCortex is not a technical one, but that startups should worry about their investors&#8217; viability as much as the other way around.</p>
<p>Getting back to the topic, it&#8217;s important to note that all of the very biggest systems in the world &#8211; e.g. Roadrunner, Jaguar, Intrepid &#8211; are based on essentially the same kind of non-shared-memory structure as SiCortex.  The Blue Gene and PowerXCell8i systems from IBM are particularly interesting in this context because they use explicit memory hierarchies even within nodes.  Shared memory is great up to its scaling limits, but its scaling limits in the real world are determined by programmer behavior.  That behavior is typically to abuse the sharing so that the amount of coherency traffic rises as (approximately) the square of the processor count, rapidly overwhelming even the interconnect resources that are available on-chip.  So far the only viable solution has been to constrain use of the interconnect by adopting models like MPI.  Hadoop manages to do something like that across larger systems that share storage rather than memory, where similar problems also occur.  Perhaps something like Cilk or a distributed Grand Central Dispatch can do that for larger systems that share memory.  It&#8217;s still an active area of research, and I wish those researchers well.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: lacos</title>
		<link>http://pl.atyp.us/wordpress/?p=2458&#038;cpage=1#comment-142483</link>
		<dc:creator>lacos</dc:creator>
		<pubDate>Thu, 05 Nov 2009 13:57:11 +0000</pubDate>
		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=2458#comment-142483</guid>
		<description>I&#039;ve read that RISC and CISC are continuously alternating, or more exactly, higher and lower levels of abstractions, both in software and hardware, fluctuate.

If you have a nice abstraction, like CISC, and its specification matches exactly what a user wants to do, then it&#039;s more efficient than hand-called RISC, because the abstract-&gt;small-pieces translation happens at a lower level, thus less round-trips.

If the abstraction leaks or doesn&#039;t match exactly what a user tries to do, then users will start to abuse the high-level API.

I think the example I&#039;ve read about was GPU&#039;s (mis)used for non-rendering, generic computational tasks, and conversely, high-performance general purpose CPU&#039;s used for graphical rendering, for greater freedom of expression. CPU&#039;s were too slow, thus a few primitives were crystallized and pushed down into hardware (GPU&#039;s). Now, people started wanting to tap that vector architecture for other purposes (coveting an access to the lower levels of the GPU), while simultaneously finding the primitives too restrictive even for graphics, with powerful CPU&#039;s being available.

I know about these topics only superficially as you can probably tell, but the conversation I seem to remember went something like this.

Second, the market is fashion driven and just plain unreasonable. If I get your point, you say that big SMP is disappearing rightfully, for the reasons (1) its complexity makes it too expensive in the market, and (2) programmers fail at programming it (the abstraction leaks and one should handle node distances explicitly, for example). This might be the case, but why didn&#039;t the market reward SiCortex then? I may be a bit off here, but I believe SiCortex mainly supported MPI (which appears to be the correct approach with the current swing of &quot;RISC&quot; in parallel programming), and it would have been cheaper as well in the long run, both for being very energy-efficient and more useful to program (more explicit, but with less obscure bugs). Still, the market didn&#039;t keep it alive.

Sorry, this is wildly incoherent, and I apologize for that. I can&#039;t express it better for now, I just feel that the market is a fickle mistress and no phenomenon can be justified by it.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve read that RISC and CISC are continuously alternating, or more exactly, higher and lower levels of abstractions, both in software and hardware, fluctuate.</p>
<p>If you have a nice abstraction, like CISC, and its specification matches exactly what a user wants to do, then it&#8217;s more efficient than hand-called RISC, because the abstract-&gt;small-pieces translation happens at a lower level, thus less round-trips.</p>
<p>If the abstraction leaks or doesn&#8217;t match exactly what a user tries to do, then users will start to abuse the high-level API.</p>
<p>I think the example I&#8217;ve read about was GPU&#8217;s (mis)used for non-rendering, generic computational tasks, and conversely, high-performance general purpose CPU&#8217;s used for graphical rendering, for greater freedom of expression. CPU&#8217;s were too slow, thus a few primitives were crystallized and pushed down into hardware (GPU&#8217;s). Now, people started wanting to tap that vector architecture for other purposes (coveting an access to the lower levels of the GPU), while simultaneously finding the primitives too restrictive even for graphics, with powerful CPU&#8217;s being available.</p>
<p>I know about these topics only superficially as you can probably tell, but the conversation I seem to remember went something like this.</p>
<p>Second, the market is fashion driven and just plain unreasonable. If I get your point, you say that big SMP is disappearing rightfully, for the reasons (1) its complexity makes it too expensive in the market, and (2) programmers fail at programming it (the abstraction leaks and one should handle node distances explicitly, for example). This might be the case, but why didn&#8217;t the market reward SiCortex then? I may be a bit off here, but I believe SiCortex mainly supported MPI (which appears to be the correct approach with the current swing of &#8220;RISC&#8221; in parallel programming), and it would have been cheaper as well in the long run, both for being very energy-efficient and more useful to program (more explicit, but with less obscure bugs). Still, the market didn&#8217;t keep it alive.</p>
<p>Sorry, this is wildly incoherent, and I apologize for that. I can&#8217;t express it better for now, I just feel that the market is a fickle mistress and no phenomenon can be justified by it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Wes Felter</title>
		<link>http://pl.atyp.us/wordpress/?p=2458&#038;cpage=1#comment-142426</link>
		<dc:creator>Wes Felter</dc:creator>
		<pubDate>Thu, 05 Nov 2009 01:01:13 +0000</pubDate>
		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=2458#comment-142426</guid>
		<description>Pen computing is back, except now the computer fits inside the pen and the tablet is made of paper. A few people at the office are using these to digitize notes. It&#039;s probably still a gimmick, though.</description>
		<content:encoded><![CDATA[<p>Pen computing is back, except now the computer fits inside the pen and the tablet is made of paper. A few people at the office are using these to digitize notes. It&#8217;s probably still a gimmick, though.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Cook</title>
		<link>http://pl.atyp.us/wordpress/?p=2458&#038;cpage=1#comment-142415</link>
		<dc:creator>John Cook</dc:creator>
		<pubDate>Wed, 04 Nov 2009 22:48:02 +0000</pubDate>
		<guid isPermaLink="false">http://pl.atyp.us/wordpress/?p=2458#comment-142415</guid>
		<description>Not only are RAID disks not &quot;inexpensive, they&#039;re &lt;a href=&quot;http://www.johndcook.com/blog/2009/01/05/rai-failure-probabilities/&quot; rel=&quot;nofollow&quot;&gt;not independent&lt;/a&gt; either, not in the sense of independent probabilities of failure.</description>
		<content:encoded><![CDATA[<p>Not only are RAID disks not &#8220;inexpensive, they&#8217;re <a href="http://www.johndcook.com/blog/2009/01/05/rai-failure-probabilities/" rel="nofollow">not independent</a> either, not in the sense of independent probabilities of failure.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
