Archive for September, 2006

Notes on Converting from Blosxom to WordPress

When I switched from Blosxom to WordPress, I had intended to write a HOWTO explaining exactly how to accomplish the task. Unfortunately, this proved to be a real challenge, since no two Blosxoms are exactly the same – there are hundreds of plugins that can change the core behavior in ways both subtle and profound. Instead, I decided to write up some notes on what I did (and why), in hopes that it can help others who want to try and make the switch. Note: I’m using WordPress 2.0.4, if you are on an older (especially pre-2.0) WordPress, your mileage may vary.

After getting a basic WordPress install setup on my webserver, I began by looking for a Blosxom import tool for WordPress. I found Eric Davis’ import-blosxom.php, available from Eric’s software page. (Note: Houran Bosci has a modified version of Eric’s script which I only discovered after the fact, I’ve not tested it, but you may want to have a look at the changelog). Eric’s script didn’t come with any docs, so I tried dropping it into my wp-admin/import folder. I then went to the Import section of my WordPress admin page, expecting to find a new option. Instead, the Blosxom import script’s output ( a series of instructions ) were oddly interspersed with the normal Import screen. I eventually found that the script had to be put directly in my wp-admin folder, and I had to go directly to that page with my browser.

Once you load import-blosxom.php in your browser, you get a page full of instructions. These instructions include the text for a Blosxom theme file, which creates a special RSS feed for your Blosxom site. It uses the extension .rss20 instead of .rss, so it shouldn’t conflict with your existing feed. You then fetch a copy of this feed, and the importer reads the feed to load your new WordPress blog with your old content. Sounds simple, right? Yeah, I thought so too.

First, a few notes about setting up the special RSS feed. By default, it needs the theme plugin, which I didn’t use. You can break the theme up into multiple flavour files, but I found it easier to just install the theme plugin. Drop and go, as I recall. The theme plugin requires the interpolate_fancy plugin, which I did use, and the filesystem plugin, which I didn’t, but that was also a quick install.

Now, by default, Blosxom doesn’t show all posts, only the most recent (by default, 10). This limit is applied for any theme/flavour, including feeds. The number can be configured via the $num_entries variable in the Blosxom script. I played around briefly with trying to use the config plugin to allow me to change the number of entries for only this feed- I didn’t want anyone visiting the site or fetching the feed to get all my entries accidently. Unfortunately, config could not do what I wanted (plugins are just loaded too late, I believe), so I cheated and made a copy of my blosxom.cgi, changed $num_entries, and used that one to fetch the special feed. I also had to disable my moreentries plugin, although I can’t remember exactly what it was breaking.

Now, the importer doesn’t fetch your new feed directly, you have to fetch and save a copy (curl or wget are good for this), copy the file to your server, and edit the import script to point to this file. This is for safety, to make sure you only import what you want to import. My first attempt revealed a few deficiencies in the rss feed and the importer.

First of all, I used Markdown to author my posts in Blosxom, and intended to keep doing so with WordPress. However, the RSS feed contains the rendered HTML, which is what gets imported into WordPress. This imports your posts just fine, but I wanted to maintain my Markdown formatting in case of future edits. I ended up disabling Markdown long enough to fetch the feed with the original Markdown formatting intact.

The next hurdle was URIs. In an earlier post I discussed some of the steps I went through to ensure all of my old URIs would work in WordPress, but the very first hurdle was preserving the post slug. In WordPress parlance (and I believe, in publishing in general), the slug is a short name for an article. Specifically for WordPress, the slug is the ‘file name’ of the post URI. The original import-blosxom.php created a slug for each post based on the title, similar to WordPress’s default mechanism for creating slugs. However, in order to keep my old URIs working with a little mod_rewrite magic, I needed the slugs to match the original filenames used to store the Blosxom posts. I hacked this in as follows:

  1. I modified the rss20 theme file to include the original file name, by adding a line to the <item> section:

    <slug>$fn</slug>
    
  2. I re-fetched the feed to pick up the change.

  3. I edited import-blosxom.php to use the slug. I replaced this line:

    $post_name = sanitize_title($title);
    

    with this:

    $slug = ''
    preg_match('|<slug>(.*?)</slug>|is', $post, $slug);
    $post_name = $slug[1];
    

Now, my after import, my WordPress slugs matched my Blosxom post names, making support of old URIs much simpler (see The Permalink Problem for more). But I wasn’t done yet.

Now, this is going to seem petty to some of you, and that’s fine. But it bothered my that the WordPress post numbers of my import posts were backwards. The most recent imported post was post 1, and the oldest imported post was post 300-and-something. Even though you should never see the post number since I use fancy URIs, I bugged me. So, I fixed it. It may also be worth noting that though these iterations, I ended up futzing my WordPress DB by hand to remove prior imports and reset the post numbering. Hopefully, if you’re following along at home on your own import, you’ll get this all right the first time.

The problem is, Blosxom renders posts in reverse chronological order, like every other blog. The import script reads the rss file and imports the posts in the order in which they appear in the file (which is to say, reverse chronological). But I wanted my posts numbers to be chronological. Yes, I’m a geek. Came to terms with it years ago. Anyway, remember earlier that I couldn’t get config to change the number of entries for the feed? I was, however, able to use it to install a custom Blosxom sort method, like so:

  1. Config has to run first, so i renamed the installed config plugin to 000config, the standard Blosxom hack for plugin load ordering.

  2. In my Blosxom content directory, I created config.rss20, the theme-specific config file for the rss20 theme:

    package config;
    
    sub sort {
      return sub {
        my($files_ref) = @_;
        return sort { $files_ref->{$a} <=> $files_ref->{$b} } keys %$files_ref;
      }
    };
    
    1;
    
  3. Another fetch-import cycle, and my posts were numbered chronologically.

Almost there. I had my content (including comments), but the comment and posts-per-category counts were all 0. It seems WordPress stores these numbers in the database instead of calculating them on the fly, and the importer didn’t update them. A couple of quick sql statements set everything right:

UPDATE wp_posts p SET comment_count = ( SELECT count( * )
FROM `wp_comments` c
WHERE c.comment_post_id = p.id ) 

UPDATE wp_categories c SET category_count = ( SELECT count( * )
FROM wp_post2cat p
WHERE p.category_id = c.cat_id ) 

That’s everything I have in my notes. Hopefully, there’s enought here to help others with a similar conversion. If you find other conversion issues or have questions about my process, please leave a comment below and I’ll try to help

Talk like a pirate

Avast! Today do be International Talk Like a Pirate Day! As usual, the whole site is in pirate speak for the day. Now that I’m on WordPress, I’ll raise a mug o’ grog to Dougal Campbell for his Text Filters Suite, which is making all the magic happen.

HOWTO Backup an Entire Windows Drive with OS X and Ubuntu

Note: This method backs up the entire drive, free space and all. If you have a 30G harddrive, you’ll need 30G free on the target Mac. If you only want to recove some of the files, check out my article HOWTO Recover Files from a Non-Bootable Windows PC using Ubuntu Live. In addition, that article doesn’t require the use of a Mac to retreive the files, you could use another Windows box, for example.

About a year ago, I posted a method for backing up a Windows Laptop with OS X, which used a Knoppix Live CD and NFS. Today, I needed to perform the same task. I wanted to use Windows file sharing instead of NFS, since support is built into OS X and can be enabled from System Preferences. I also wanted to use Ubuntu instead of Knoppix, since I had a Ubuntu 6.06 CD handy, and I’m a Ubuntu fan. While I had some issues, I came up with a method which I think is easier then the old one.

The laptop I needed to back up recently had a memory chip go south, so it only has 192M of memory, and I believe part of this is used for video memory. Unfortunately, this is below the recommended 256M minimum RAM for a Ubuntu Desktop install. I haven’t found a minimum requirement for running the live CD, but considering that no swap space is availble when running a Live CD, I expect it is at least the same. I found that running the Live CD on this machine was so slow as to be unusable.

Hoping to find a way to reduce memory requirements, I searched for a comprehensive guide to the Live CD’s boot options, with no success. I also searched for a way to boot the Live CD without X Windows (text mode only), also with no success. If anyone can help with either option, please leave a comment.

Stuck, I decided I’d have to download a different Live CD. Although other options exist, I decided this was a good opportunity to try XUbuntu , a Ubuntu variant that uses XFCE instead of Gnome for its desktop environment. XFCE is designed for machines with low resources, and the Live CD requires only 128M.

XUbuntu worked fine. I couldn’t find some of the GUI tools Ubuntu provides that I’ve used in past HOWTO’s, so this one is mostly command line. As a bonus, if you have a machine with a little extra RAM, and already have a Ubuntu Live CD handy, these instructions should work equally well.

  1. On the Mac that will receive the backup, make sure that Windows Sharing is enabled via the Sharing pane in System Preferences. By default, this will share your home directory; that’s where we’ll put the backup. In the instructions below, my user name is jclark, substitute your own. Also, make a note of your Mac’s IP address. If you don’t know it, open Terminal and run ifconfig.

  2. Boot the system to be backed up with a (X)Ubuntu Live CD (also called the Desktop CD in the latest release).

  3. Run the Terminal application. Depending on your *Buntu of choice, it will be in one of the menus.

  4. Install support for mounting Windows shares (will be installed in RAM only):

    sudo apt-get install smbfs
    
  5. Create a mount point (a local directory that will host your Mac home directory):

    cd /mnt
    sudo mkdir mac
    
  6. Mount your (Mac) home directory on the source machine. Change the “192.168.1.100” to your Mac’s IP address, and change “jclark” to your Mac username (in both places):

    sudo mount -t cifs -o 'username=jclark' //192.168.1.100/jclark /mnt/mac
    

    You will be prompted for a password, provide your Mac password. Note: using -t cifs instead of -t smbfs (as you may expect) avoids a 2GB file size limitation.

  7. Copy the hard drive to your Mac. This assumes your Windows hard drive is ‘hda1’, which it probably is. If you know it isn’t (and you know the correct value), change accordingly.

    sudo dd if=/dev/hda1 of=/mnt/mac/drive_backup.img
    
  8. Wait. This could take a while. Backing up my 30G drive took 6.5 hours. I expected it to be faster, maybe the laptop only supports 10MBit Ethernet. If you’ve got 100MBit ethernet, this should be faster. Oh, and if you are on a machine with a wireless connection, assuming it even works under Ubuntu, I reccomend using a direct (wired) ethernet connection for this if at all possible.

  9. When it finishes, unmount the shared drive:

    sudo umount /mnt/mac
    

    and shut down Ubuntu.

  10. On the Mac, in your home directory, you now have a disk image named drive_backup.img. Double click it to open it like any other drive image. You can copy files out as needed.

August Statistics

Yes, more meta-blogging. Hey, at least I’m posting. And it’s been ages since I posted the old browser stats. Here’s the top of the browser breakdown for jclark.org for August:

1.  FireFox                 46.0 %
2.  MSIE                    25.8 %
3.  Unknown                 12.6 %
4.  Safari                   6.7 %
5.  NetNewsWire              2.1 %
6.  Mozilla                  1.8 %
7.  Opera                    1.8 %
8.  Konqueror                1.6 %
9.  Camino                   0.6 %
10. Netscape                 0.3 %

Item 3, Unknown, is largely composed of feed readers. Lots of feed reeders. Yea for Firefox at number one, but 25%+ of you are still using IE? Have you learned nothing?

And now, stealing shamelessly from Effika, here are the top ten searches for August.

                                     #hits
1.  jason clark              2.4 %     62
2.  firefox printing         1.3 %     36
3.  odd/even number rule 
    of star trek movies      1.3 %     35
4.  open source flickr       1.2 %     33
5.  flickr open source       0.9 %     24
6.  perl module version      0.8 %     21
7.  markdown vs textile      0.7 %     19
8.  feed autodiscovery       0.5 %     14
9.  ubuntu mount hard drive  0.5 %     13
10. firefox print            0.5 %     13

Nothing too unexpected here. Not even item 3. Judging by the fact that I haven’t been flooded with email from long-lost friends, I’m guessing item 1 isn’t people looking for me… so who are you looking for?

Odds and Ends

It’s been a busy week, but I’ve managed to find time for a little recreational computing here and there. Here’s the latest on some of my recent entries.

  • My WordPress conversion is largely completed. I still need to get a comments Atom feed set up, and I’d still like to tag some of my pre-conversion posts, but I’m otherwise nearly done.

  • I still haven’t quite gotten tag pages working as I want, but I’ve got it working with redirects, for now.

  • I’m still making little tweaks to Tranquility, the WordPress theme I created for the site, as well as some of the meta parts of the site. I haven’t finished the About or Colophon pages yet, but I did get the Licence page finished, and a number of minor display tweaks made.

  • I replaced my lousy V5 WRT54G with a $30 refurbished V1.1 and a copy of OpenWRT White Russian RC5, which rocks.

  • The one thing I’ve always wanted from my router has been DNS resolution of my DHCP host names; none of the Linksys firmwares across two models of router have done it, and neither did the last version of Sveasoft I tried (the last before the subscription fiasco). Works perfectly with OpenWRT; with a quick edit to the device’s /etc/hosts file, my static IP machines (my server, and the router itsself) also now have DNS-resolvable names.

  • Less than a week after I fixed my page titles, Google has reindexed most of them, so now my search hits have titles again.

  • Technorati is still showing lots of bad links from my test site, which is 410 Gone.

  • I still need to write up my Blosxom to WordPress conversion method, which I promised to do. Hopefully by next weeked.