Package Managers

There are many things that differentiate a true software engineer from a mere programmer. Most of them are unpleasant - planning releases, reviewing designs or code, testing, release engineering, and so on. One of the most odious tasks is packaging software. I'll admit that it's an area where my self-discipline sometimes breaks down and I dump the task on somebody else as quickly as I can. Nonetheless, I recognize that the task itself as well as the tools and people who do it have value. I recognize that the rules those people have developed generally exist for a good reason. Apparently some people don't.

The post actually makes some pretty decent points, especially about packagers breaking up packages unnecessarily. Mixed in are some really bad points, of which I'll focus on just three.

Dynamic linking lets 2 programs indicate they want to use library X at runtime, and possibly even share a copy of X loaded into RAM. This is great if it is 1987 and you have 12mb of ram and want to run more than 3 xterms, but we don’t live in that world anymore.

That demonstrates some pretty serious ignorance about the real issues, including performance. Sure, people have lots of RAM, but they want to use it for something besides redundant copies of the same (or almost the same) code. More applications, more VMs, more heap space for whichever program is the machines main role, etc. A dozen copies of the same library means a dozen times as much RAM and cache, and making those footprints larger does indeed have an impact on performance.

One often touted benefit of dynamic linking is security, you can upgrade library X to fix some security hole and all the applications that use it will automatically gain the security fix the next time they’re run (assuming they still can run). I admit this benefit, but I think that package managers could work around this if they used static linking (Y depends on X, which has a security update, rebuild X and then rebuild Y and ship an updated package).

That doesn't really work. You might be able to build against the new version of X, but that doesn't mean the result will be free of subtle bugs due to the difference. The author even seems aware of this when he talks about the "carefully curated" (how pretentious) libraries that are shipped with Riak, but sort of tries to walk both sides of the street by ignoring the issue here.

The situation gets even worse when transitive dependencies are considered. Let's say that X depends on a specific version of Y, and it enforces that dependency either via the package definition or via bundling. Either way, if Y depends on Z then an update to Z can also break X. This possibility remains unless X includes all of its dependencies all the way down to the OS. I know plenty of people who do exactly this in the form of virtual appliances and such, and it's a valid approach when pursued to its logical conclusion, but capturing only one level of dependencies solves nothing in return for the problems it causes.

The last issue has to do with bundling modified versions of dependencies.

Leveldb is a key/value database originally developed by Google for implementing things like HTML5’s indexeddb feature in Google Chrome. Basho has invested some serious engineering effort in adapting it as one of the backends that Riak can be configured to use to store data on disk. Problem is, our usecase diverges significantly from what Google wants to use it for, so we’ve effectively forked it

This approach is problematic for reasons that go well beyond packaging. There's also a serious "doing open-source wrong" aspect to it as well, though there may be room for debate about which side is guilty in this case. Nonetheless, these things do happen. I myself violated the no-bundling rule for HekaFS on Fedora at one point . . . and you know what? It ended up being broken, for exactly the reasons we're talking about. If you do have to bundle a modified version of someone else's code, there's a right way to do it and a wrong way. The right way is to engage with the distro packagers, instead of calling them "OCD" or accusing them of adhering blindly to "1992" standards that have become outdated, and collaborate with them on a sustainable solution. That solution is very likely to include more tightly specified dependencies, and a more active role keeping your own package up to date as the underlying original dependency gets updated. It's a huge pain for everyone involved, which is why it should only be done as a last resort. If you do decide to go down that path, then at least - as I put it in the Hacker News thread - pull up your big-girl panties and deal with it. Asking someone else to do part of your job and then complaining about how they do it is a loser move.

Comments for this blog entry