It looks like it’s that time of year again, when I start looking at how the site’s doing. I thought traffic had slacked off a bit, but I went back and found my numbers from last year and I see it’s not so. At that time, 358 visits/day and 3.4 GB/month represented significant increases over what the numbers had been before then. Now, I’ve been over 500 visits/day for two months in a row. Last month that resulted in 4.26GB consumed, which means it’s a good thing I changed hosting plans/providers. This month I’ve been far more aggressive about blocking image hits from discussion forums (more about that in a moment) so my bandwidth has only been 3.49GB with two days left to be counted. The bandwidth/visit is mostly lower than last year because an ever-higher percentage hits are from RSS readers which are much more bandwidth-efficient.

The image-leech problem is a common one, so I figured I’d discuss it a little further. For those who aren’t familiar with it, here are the basics. If you link to one of the images here from your page, any browser loading your page will also fetch the image from here, increasing my bandwidth usage for the month by the size of the image. If your page is popular, that increases my bandwidth usage a lot, possibly pushing me above my monthly limit – at which point it becomes money out of my pocket. Needless to say, I’d resent that. It’s a constant worry/annoyance for everyone who runs a site with popular images, which is why the usual (somewhat counterintuitive) advice is to copy images and host them somewhere else either free or at your own expense instead of linking to them.

Though impolite, image leeching probably wouldn’t be a problem if it weren’t for web forums. If you post a link to one of my images (my cubic-wombat-poop picture is by far the most popular target) in a web-forum thread, I’ll get hit every time someone views the thread. On a busy forum, that adds up very quickly. Even worse, if you use one of these images as an avatar (the little picture that shows up next to your screen name on many forums), I’ll get hit every time someone views any thread where you’ve posted. I really hate that.

Here’s where things get a bit technical. For a long time, I tried to block abusers by site. In HTTP/Apache terms, I had entries in my .htaccess file that would look for certain host names in the “Referer” field for each request. That didn’t work all that well, because almost every day someone would be posting links to a new forum that I’d never heard from before. This month, I started blocking particular kinds of software instead. Most forum software can easily be identified by the appearance of strings like “viewtopic” or “showthread” in the Referer. I’ve developed a collection of about two dozen such strings, and this approach seems to be working very well. When I look at my logs I see appropriate denials even for sites I’ve never heard of before (but which are using familiar software). Meanwhile, requests from search engines or forums that Cindy or I use (for which I’ve created exceptions) or various personal webpages are being served just fine. If anybody wants to borrow my list of banned Referer strings, or collaborate on making it a complete list, just drop me an email.