Software Patent News

First, a lot of people have probably heard that Microsoft lost an appeal in a $290M case against Canadian firm i4i.

A federal appeals court on Tuesday affirmed a $290 million patent infringement judgment against Microsoft Corp. (MSFT) and reinstated an injunction that bars the company from selling current versions of its flagship Word software.

The U.S. Court of Appeals for the Federal Circuit said the injunction will go into effect Jan. 11.

After the jury verdict, U.S. District Court Judge Leonard Davis issued a permanent injunction that barred Microsoft from selling Word 2003 and Word 2007, which use a technology called “Custom XML” that is used to classify documents for retrieval by computers. Davis also ordered Microsoft to pay more than $290 million in penalties.

Barred from selling one of their flagship products? Wow. I managed to glean some idea about custom XML is, but I still can’t quite figure out what it does in the sense of providing any kind of useful or significant functionality to end users. It seems like one of those things that gets argued about in standards meetings, mostly by people who are more confident of their ability to keep up with developments in such intellectually inbred communities than in the wider world of real-life computing, and nobody cares until it hits the patent system. An interesting sidelight is that the original judgement seems due in part the result of a conflict between one of Microsoft’s lawyers and the East Texas judge. That’s a fight where it’s really hard to root for either side, but the practical consequence is that it might give Microsoft grounds for further appeal. By the way, how does a company from Toronto get to sue one from Redmond in a Texas court? That’s not only the wrong state but the wrong country.

So the news about Microsoft and i4i might be a mixed bag, but the other piece of news is quite a bit better. The US Board of Patent Appeals and Interferences (the what?) issued a ruling about the patentability of machines using mathematical algorithms. The ruling is from August, but it just became “precedential” this past Monday. I have no idea what that means, except maybe that those who find XML standards too accessible should consider a career in patent law. Anyway, here’s the good part.

The BPAI’s test for a claimed machine (or article of manufacture) involving a mathematical algorithm asks two questions. If the a claim fails either part of the two-prong inquiry, then the claim is unpatentable as not directed to patent eligible subject matter.

(1) Is the claim limited to a tangible practical application, in which the mathematical algorithm is applied, that results in a real-world use (e.g., “not a mere field-of-use label having no significance”)?

(2) Is the claim limited so as to not encompass substantially all practical applications of the mathematical algorithm either “in all fields” of use of the algorithm or even in “only one field”

In a nutshell, this reaffirms the principle that only the application of an algorithm – not the algorithm itself – can be patentable. This has supposedly been true all along, but many patents on applications have effectively become patents on algorithms because the applications have been defined so vaguely or broadly that no other application can escape claims of infringement. This ruling says that failure to limit the scope of such a patent properly might not only cause a specific infringement case to be dismissed but might also cause the patent in its entirety to be invalidated. This could fundamentally change the way people pursue such patents. Until now the incentive has been to make claims as broad and vague as possible, because there was no risk in doing so. Now, though, there’s a risk that over-reaching might destroy the patent’s value entirely, so there’s an incentive to be more specific. Only time will tell whether it actually works out that way, but it’s a good sign.

Now I’m kind of curious about which of my own (five) patents would survive this test. I doubt that they’ll ever be tested, and I won’t provoke my former employers’ legal departments by speculating, but it’s an interesting question.

Target as a Service

One of the dangers of deploying in a public cloud is that Evil People are very much attracted to public clouds. This is for two main reasons:

  • Public clouds are a target-rich environment for evil-doers. The IP ranges tend to be densely packed with servers, each of which might contain a large trove of easily-exploitable information. Contrast this with a public connectivity provider such as Verizon or Comcast, where any given IP address is much more likely to be either unused or assigned to an individual PC containing only a few pieces of exploitable information.
  • Most public clouds’ network virtualization and firewall setups make it hard to do proper intrusion detection, and providers’ own intrusion detection is likely to be of little use. Heck, you can’t even rely on most of them doing any intrusion detection since they won’t tell you.

This isn’t just random paranoia; it’s actual experience. I’ve been running my own little server in the Rackspace cloud for a little while. Here’s a little tally of failed ssh login attempts for that one machine over a mere week and a half.

382 195.149.118.43
136 61.83.228.112
103 89.233.173.91
60 195.149.118.43
54 210.51.47.177
53 74.205.222.27
37 200.27.127.95
36 74.205.222.27
32 61.83.228.112
20 74.126.30.189
12 222.122.161.197
10 74.126.30.189
8 210.51.47.177
6 202.85.216.252
2 78.110.170.108
2 222.122.161.197

That’s four attempts per hour, for a site that nobody really has any reason to know exists. Of course, they’re not evenly spaced. The most recent attack came at one attempt per two seconds. The attacks are also coming from all over; the top three address above are from Poland, Korea, and Germany respectively. It’s also worth looking at what accounts people are trying to break in to.

532 root
18 ts
16 admin
15 postgres
14 test
14 oracle
14 nagios
12 mysql
11 shoutcast

Note that this is just one (particularly blatant) kind of attack, on one unremarkable machine, over a short time period. Imagine what the numbers must be for all attacks across a whole farm of machines, especially if early probes had shown encouraging signs of being weakly protected. I’m sure Rackspace makes some efforts to defeat or at least detect intrusion attempts, but how much can it be? If you were a cloud operator, what would you think of the following pattern?

  • Many ssh connections from the same external host, previously unknown to be associated with the internal one.
  • All connections spaced exactly two seconds apart.
  • Each connection made and then abandoned with practically zero data transfer (not even enough for a login prompt).
  • The same pattern repeated for other internal hosts belonging to different customers, either simultaneously or in quick succession.

That seems like one of the most glaringly obvious intrusion signatures I can think of, worthy of notifying someone. For all I know Rackspace does detect such patterns, and either cuts off or throttles the offending IP address, but there seems to be little sign of that. This is not to pick on Rackspace, either. I picked them for a reason, and I’ll bet the vast majority of other providers are even less secure. The real point is that if you’re in the cloud, even if it’s a good cloud, you need to be extra careful not to leave ports or accounts easily accessible to the sorts of folks who are aggressively probing your provider’s address space looking for such open doors.

Getting Serious About Filesystems

Jonathan Ellis (@spyced) posted a link to an article about using ZFS-FUSE on Linux, ending with this bold claim.

I’m expecting that ZFS-FUSE is going to continue to be the only real option for Linux users to have access to a modern file-system for at least the next 2 years.

There’s only one word for that: bollocks. I happen to work in a group where we get pretty good exposure to everything that’s happening in the Linux-filesystem world, and what Reifschneider says is just not true. His claim that “btrfs is still years away” is just Sun FUD, and Sun hardly needs help in the FUD department. Within a year btrfs will be where ZFS was when it was released. Will it be where ZFS is today? Probably not, but that’s not the correct standard. This isn’t about feature parity; it’s about a “real option” for Linux users, and if btrfs a year from now isn’t good enough then neither was ZFS in 2004 and I’d expect any btrfs critics to admit it. NILFS might also be a “real option” by then as well, though it seems a bit less likely.

No matter what you think of btrfs or NILFS, though, there’s another more important point: ZFS-FUSE isn’t a “real option” either. No FUSE-based local filesystem is or ever will be. Don’t get me wrong – I love FUSE. I use sshfs, which is based on FUSE, every day. FUSE is great for putting a filesystem interface on top of anything else, and even for distributed filesystems where the overhead it incurs is outweighed by network latency. In the specific performance milieu where local filesystems must live that overhead is simply unacceptable, and so are its functional deficiencies (e.g. mapped files). If user-space filesystems were adequate in that role, then everyone – notably including the original ZFS crew – would have switched to writing filesystems that way years ago. They haven’t, and for good reason.

If your workload is such that the ZFS functionality matters more to you than those issues – and the “Ultimate Storage Box” seems to fit that description at a mere 240MB/s by Reifschneider’s own report – that’s wonderful. Knock yourself out. It’s just no excuse to go around dismissing others’ efforts toward something that’s truly industrial strength.

Got Cloud Storage?

I’ve been using a poor man’s kind of cloud storage for a while now. In other words, I’ve been using sshfs to mount a directory on my home NAS box from wherever I happen to be. It works to a reasonable degree, but it has two problems.

  • It uses DHCP and dynamic DNS. The dynamic DNS keeps timing out even though my access point is supposed to renew it, which is annoying.
  • It’s not compliant with Red Hat’s information-security policy, which requires at-rest encryption for anything stored off site, and I’d really like to store copies of the documents I’m working on there for when I’m at home or on the road.

I do have a cheap domain that I completely control and could point to my home NAS box. That would avoid the DNS issue, but I’d still have the DHCP and at-rest-encryption issues. I’m really not wild about installing encfs or similar on the NAS box, and that would be incompatible with my long-term plans anyway, so I’ve done something else instead. I’ve set up a low-end server in the Rackspace Cloud, installed everything I need there, and pointed my domain to it. That’s a poor man’s cloud storage with complete control over my software and data, for barely more than many of the consumer-oriented cloud-storage providers would charge to give them my data and get a more limited interface in return. I think I can say it’s real cloud storage because – unlike the home-NAS-box solution – I can go in and re-provision it to be arbitrarily bigger or faster any time I want. I could also make it even cloudier by implementing an interface to their Cloud Files service, but it would be hard work to make that much better than what I already have. More importantly, it would be work very similar to my day job, so it wouldn’t exactly be a good use of my way-too-scarce spare time as far as I’m concerned.

Cloud Storage Patent Troll Wins First Case

I’ve written before about Mitchell Prust’s cloud-storage patents. Yesterday, his law firm announced that he won a judgment against NetMass (the SoftLayer case). Patent Troll Central has outdone themselves this time. Now the odds of Prust winning the related case against Apple increase, as do the chances that he’ll file similar cases against everyone else involved in cloud storage.

I’ll try to find and read more of the details some time this weekend. I’m good with the technical jargon, but not so much with the legalese. Are there any lawyers out there who’d be willing to help if I can find copies of the relevant documents?

Update: I’d also like to add that, at the time the first of these patents was filed (February 2000) I’m pretty sure there were several network-drive packages already on the market. X-Drive and FreeDrive come to mind; I know there were others. Further back, when I started working at Technology Concepts International in 1989 they already had a DECnet-based virtual disk in their CommUnity product. The ideas represented by these patents were at least obvious at the time, if not actually (in all relevant detail) embodied in products already available to the public. It’s hard to imagine how anyone but the East Texas court could consider the patents valid in light of that.

New Boot Option in EC2

Amazon’s Werner Vogels, a.k.a. Santa For Nerds, has posted about the newly added capability to boot an Elastic Compute Cloud instance using an Elastic Block Store volume (i.e. network storage) instead of an Amazon Machine Image (which uses local storage). He lists several advantages, chief among them the much-asked-for ability to have an instance’s root volume persist across a stop and restart, but he doesn’t mention what might be another important advantage: performance. In some of my tests, EBS outperformed instance storage by a considerable margin, and that advantage should extend to an EBS root volume. I guess I’ll have to re-run some of my tests now.

Another set of issues here has to do with multiple instances from a single EBS snapshot. With the AMI infrastructure, it’s easy for EC2 to reach in and do instance-specific customization before you boot. Since the EBS snapshots are basically opaque to EC2, though, every instance is going to come up with exactly the same configuration. Therefore, instead of relying on EC2 to do this customization, users will have to do it themselves using many of the techniques familiar from diskless booting – e.g. looking up configuration based on MAC or IP address (which will be unique per instance). It’s not all that hard, but I’ll bet some people will get caught by this the first time.

Anti-Social Networking

One of the things I’ve really come to dislike about many bloggers is their endless self-promotion. Many people seem to follow up even their most trivial blog post by linking to it on Twitter, on Facebook, on LinkedIn, on several tech-news aggregators (dzone is particularly afflicted by this), and on just as many mailing lists. I don’t see the point. I believe we’ve all become sufficiently well connected that good content will tend to find its audience without such shenanigans. I occasionally write something that gets linked elsewhere, causing a spike in my readership – sometimes well after an article was actually posted. I like that, I take pride in it, but my traffic numbers today are only interesting relative to my own traffic numbers yesterday. Increasing traffic means I’m getting better at writing things that my audience seems to like, and that makes me happy. I feel no need to compare my numbers to anyone else’s, though. If I started pimping my articles everywhere I could, my traffic would start to reflect the effort I put into self-promotion, instead of the effort I put into thinking and writing, and all comparisons to my own historical numbers would be invalid. That seems like a loss to me.

If you like something I write here, and think some other audience would benefit from seeing it, by all means post a link wherever you want. I know some of my readers have already done that many times, and I thank them for it – especially you, Wes, wherever you are. I think such genuine “votes of confidence” from others are worth far more – to me and to readers – than me linking to myself could ever be, which is a large part of why I decline to play that game. I’m opting out of that particular rat race, and any other race that can only be won by the biggest rat. I like my niche.

Storage Phobia

Back in the day, a lot of people in computing were afflicted with network phobia. They didn’t understand networking, so they avoided it. Even when they had to provide services over a network, they left their systems as centralized as possible. They hid behind RPC abstractions – a flawed approach which Eric Brewer demolished in the often-forgotten second half of his CAP Theorem presentation. They hid behind transaction monitors and load balancers, and contorted their internal architecture to work with those. If they cared about availability, they implemented pairwise failover. Finally, twenty years or so later, we have a generation of programmers who grew up with networking and who are finally comfortable working on truly distributed systems. That’s wonderful. Unfortunately, eliminating one barrier often reveals another, and in this case the second barrier is storage phobia.

A lot of people fear or hate working with storage. They hate the protocols, especially Fibre Channel, because they’re loaded down with excess complexity. Some of the complexity is there because it needs to be there, because Fibre Channel does a lot more than plain old Ethernet. A lot more is there because of standards-body silliness. My favorite example is having to identify devices using Inquiry strings which might be presented in several different formats crossed by several different encodings, all so that every T11 member could be satisfied. (Handling all these combinations instead of just one or two, BTW, is the kind of thing that distinguishes software written by diligent professionals from Amateur Hour.) InfiniBand and iSCSI are just different standards groups making the same mistakes. Even more than standards and protocols, though, many people fear or hate working with storage implementations. It wasn’t so long ago that many significant kinds of reconfiguration on an EMC Symmetrix required typing hex into a console (anatmain) followed by a full restart. Yes, it was even more primitive than Windows. Every array, every switch, every HBA has its own “quirks” with which customers inevitably develop an unwelcome intimacy. Believe me, people who’ve worked in these trenches for many years understand all of that much better than anyone looking in from outside ever will.

Nonetheless, people from outside often seem to think they can do better. Do they try to do better by preserving some of the existing technology’s strengths and addressing its weaknesses? Sadly, no. Instead, they fixate on disk-failure-rate statistics from two years ago, presenting them as “evidence” of their already-held conclusion that conventional storage is something to be avoided. Is the total MTDL (Mean Time to Data Loss) of an internally redundant shared array the same as that for individual disks? Are disks housed in servers – and thus subject to the servers’ own failure rate as well as their own – going to fare better? Does the lack of moving parts make RAM infallible? No, no, and NO. But, but . . . those disk-failure rate numbers are scary! Surely any alternative must be preferable? Argh. Letting fear of conventional storage drive the creation of “solutions” that are just as complex without the benefits of being as general, as well tested, or as well documented is a mistake. (Open but undocumented and untested can be worse than closed, BTW, if the cost of reverse-engineering and fixing the implementation is greater than the cost of licensing would have been.) Such attempts generally lose even when considered alone, and even more so when the effects of fragmentation and incompatibility are considered.

Yes, even the best storage systems can fail. Yes, software must ultimately be responsible for recovering from failure. Yes, many programs make the incorrect assumption that high-end storage fails too rarely to worry about, and thus abdicate their own responsibility. That doesn’t mean there’s no value in storage which handles some failures transparently and others without losing access to even one single replica. It certainly doesn’t mean that some new piece of software based on such an assumption will make anything better. Good software is based on balance, not extremism. That means not making the software more complex by coding to the least common denominator. It means taking advantage of external facilities that are already common and cost-effective, and letting the software focus on the problems that it alone can solve.