Blogroll Engineering

John Gruber recently concluded his annual Daring Fireball membership drive. Since quitting his day job a couple years ago to write DF full time, John has sold annual memberships and t-shirts (free annual membership included) to support himself in his new profession. When you become a member (as I recently did), you not only get a warm-fuzzy for supporting a great writer, and perhaps a t-shirt, you also get members-only goodies: full text RSS feeds for Daring Fireball and The Linked List. These feeds use HTTP authentication to allow access to members only.

For several years, I have used Bloglines as my feed reader of choice. It’s easy to use, does what it says on the tin, and I can access my feeds from anywhere. I have been subscribed to the standard Daring Fireball RSS feed for years, which contains only brief descriptions of each article. When I recently became a DF member, I unsubscribed from the old feed, and subscribed to the full content feed and the Linked List feed in its place (John also offers members a combined feed, but I prefer to see them separately in Bloglines). Unfortunately, as a result, Daring Fireball no longer appears in my blogroll (at least as of this writing). To see why this is, I’ll explain how my blogroll is maintained, and why this method doesn’t work for the DF members feeds.

Like most feed reading applications, Bloglines allows you to export your list of subscriptions as an OPML file. Because OPML is an XML format, its easy to turn this list of subscriptions into something else, via the magic of XSLT. This is how I create both of my blogrolls- the short version in the right-hand column of every page on this site, and the long version linked to from the end of the short version.

The first step is to obtain a copy of my Bloglines subscriptions in OPML format. Bloglines makes this available at an URL, in my case http://www.bloglines.com/export?id=jclark. As part of my larger (and desperately in need of updating) effort to backup my data stored around the web, I have a daily cron job that runs at my webhost, grabbing a copy of the file using wget (curl would also work) and sticking it in my datasafe directory. One of these days, I’ll get that directory under version control with cvs or svn. For now, at least I have a daily backup. The wget command is simple:

 wget -O datasafe/bloglines.xml http://www.bloglines.com/export\?id=jclark

The next step is to turn the OPML into something more useful- in this case, a piece of HTML that I can include in my web pages. I have another daily cron job that runs the following perl script:

#!/usr/bin/perl -w

use XML::LibXML;
use XML::LibXSLT;

my $parser = XML::LibXML->new();
my $xslt   = XML::LibXSLT->new();

my $opml = $parser->parse_file('/home/jclark/datasafe/bloglines.xml');
my $ssdoc = $parser->parse_file('/home/jclark/xslt/blogroll.xsl');
my $ssshortdoc = $parser->parse_file('/home/jclark/xslt/blogroll-short.xsl');

my $ss = $xslt->parse_stylesheet($ssdoc);
my $ssshort = $xslt->parse_stylesheet($ssshortdoc);
my $blogroll = $ss->transform($opml);
my $blogroll_short = $ssshort->transform($opml);

     $ss->output_file($blogroll,       '/home/jclark/jclark.org/blogroll.html');
$ssshort->output_file($blogroll_short, '/home/jclark/jclark.org/blogroll-short.html');

The above script uses LibXSLT to transform the OPML file twice- once to create the full blogroll, and once to create the short blogroll. The transforms are similar, but the ‘short’ version only considers items in my Bloglines “Favorites” folder. If you are interested, you can see the full blogroll XSLT, and the short blogroll XSLT stylesheets.

This method of driving my blogroll from my Bloglines subscriptions has served me well – no maintenance needed. I even have control over what to include. Bloglines allows you to flag each subscription as public or private. For example, I have a few Google query feeds that don’t need to show up in my blogroll. I simply flag them as private in Bloglines, and they aren’t part of the export.

Now we’ve arrived at the crux of the matter. Because the Daring Fireball member feeds require HTTP authentication, Bloglines automatically treats them as private. This makes alot of sense, but it means that these feeds are no longer in my OPML export, and so are no longer on my Blogroll. Of course, I could have left the original feed subscribed, but I didn’t realize I would need to at the time, and I don’t want to see duplicate entries if I do. Unless I can think of a better alternative, I’ll probably add an extra folder within Bloglines to store feeds I subscribe to for the benefit of my blogroll, but which I never read directly. The other alternatives I’ve considered involve a separate list of blogroll entries stored somewhere and managed manually, so I might as well manage them where I manage the rest of my blogroll- in Bloglines. Also, because DF is in my Bloglines “Favorites” folder, it used to show up on the short blogroll on every page. If I want it to remain in the short blogroll (and I do), I’ll need to tweak the XSLT stylesheet to include entries from both folders.