Archive for the 'ThisSite' Category

Note: I've reorganized this site to use tags; the category archive remains to support old links. Only posts prior to April, 2006 are categorized. Tag Archive »

This is My Final Post

…to the Blosxom-based blog I’ve run at this location for three years. In fact, my three year anniversary passed silently four days ago.

However, I’m not done blogging yet. As great as Blosxom is, I’m ready for something with less friction. I’ve decided to migrate this site to WordPress.

So what’s in store?

  • Existing posts will remain
  • Existing comments will remain
  • Comments will be reactivated (!)
  • I’ll publish my Blosxom-to-Wordpress conversion method/code
  • Date-based permalinks
  • Existing permalinks will work
  • The categories will remain as archives, but new posts will be use tags
  • I hope to get the old posts tagged as well
  • A brand new layout (finally!)
  • A valid Atom 1.0 feed (I had to hack this in, I’ll publish my code after the conversion)
  • Hopefully, I’ll start posting again!

The timeline’s not set, but I’m hoping to have everything done this weekend. There’s still one major technical issue to overcome, but I’ve asked for help in the WP support forum, and I’m poking around in the code. Stay tuned.

Outage

Sorry for the outage in my posting here; been far too busy away from the computer for the past month+. Tonight I cliked the bookmark for my blog by mistake, only to find the site suffering a real outage. A check of the log indicates that Markdown was dying. I think my host upgraded perl versions in the last few days (although I haven’t confirmed); the error was in code that had been working for months (at least).

Since I was still running a beta of the 1.0 release of Markdown, I didn’t spend much time trying to fix the issue, I just upgraded to the current markdown release (1.0.1) and the problem is corrected. I’ve only spot checked a few pages, so if you see anything not working or looking odd, please shoot me an email.

Goodbye, Yahoo

For someone who has been in the business as long as Yahoo, they sure have a stupid bot. The Yahoo (neƩ Inktomi) Slurp bot often requests completely strange URIs constructed from various things on my pages. Take this access log entry, for example (split onto multiple lines for clarity):

68.142.249.168 - - [27/Jan/2005:19:29:58 -0800] 
  "GET /weblog/Miscellany/About/xxx@xxxxxxxx/CoolStuff/Potpourri/Spanish/
        Hardware/Toys/Miscellany/WebDev/Browsers/Programming/Rants/Apple/
        Music/XML/XSLT/XML/XSLT/Spanish/WebDev/Blogging/Hardware/Apple/
        Programming/Microsoftish/ HTTP/1.0"
  200 5798 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; 
        http://help.yahoo.com/help/us/ysearch/slurp)"

Yes, the URI contained an email address in that rediculous chain of categories (which wasn’t mine, so I elided it). In the bot’s defense, Blosxom will take the URI in stride, and return a page with no entries. There are plugins to modify this, which I may look into. Also, my comment system has a bug that can result in bad links when a bad URI (or no URI) is supplied. However, I don’t think that URI is naturally occuring on my blog. Even if it is, I don’t see any other bots asking for things like this.

Speaking of other bots, so far this month Yahoo Slurp has sucked down over 4 times the bandwidth that Googlebot has used. During this period, Google has delivered over 2600 hits, while Yahoo has delivered 107 (for those keeping score at home, MSN has delivered 23 hits).

So, 400% of the overhead, 4% of the return. Fortunately, I’ve found a way to tweak these figures, by adding a new entry to my robots.txt:

    User-agent: Slurp
    Disallow: /

Toodle-oo, Yahoo.

Site Linking Policy

All of the content on jclark.org is Creative Commons licenced – you may use it as you like as long as you give attribution and share your changes. This has been my policy almost since the beginning of the site.

Linking to the site is also encouraged; I make every effort to ensure my permalinks always work. However, linking directly to images is prohibited. If you’d like to use one of my images on your site, please make a copy and host it yourself. Someone has begun linking directly to one of my badges, which I consider a gross abuse of my bandwidth.

I have searched all over the offending website, and cannot find an email address for the site owner. Since I’ve never posted a linking policy, I would like to contact him and allow him to change his site before I take steps to prevent this. Therefore I’m posting this now, to allow him time to correct the problem. By this weekend I will be actively blocking this and possibly serving alternate content instead.

Googlebot, Update These

This post is for Googlebot. Go follow these links, and index them:

Thanks pal.

For the rest of you, who are probably wondering if I’ve hit my head (not that I remember), I’ll explain. While checking my referers briefly (whole other post on that topic forthcoming), I noticed what looked like a search engine hit for a topic I don’t normally post about, of an adult, or more likely a teenage male, nature. (I’m not a prude, but listing the search terms would defeat my purpose). A quick check showed that the site was a non-English search engine powered by Google.

It seems that Googlebot stopped by the day I was hit by over 2000 comment spams. Although I took the entire comment system offline to remove that crap as soon as I saw it, Googlebot must have indexed a few of the pages. Those pages linked above are the pages it indexed that day and has apparently not reindexed since. I like website traffic as much as the next blogger, but I’m really not in the market for search engine hits for animated attacks on women. I certainly don’t want to be the number #6 Google hit for that search (which I am, for the moment). The sooner Googlebot indexes the clean versions of the pages above, the better.

Also of interest, Googlebot found those pages via www.jclark.org, instead of jclark.org. They are the same, but I omit the www as it is unnecessary. Other people who link to me occaisionally use the www, which is why it is indexed both ways. Of course, it shouldn’t be indexed twice, so I’ll be adding a mod_rewrite rule to my .htaccess file to permanently redirect all www. URIs to their sub-domain-free equivalents. Just to be safe, however, I think I’ll wait until Googlebot reindexes the above links.

An Announcement

To anyone who has viewed my site in the past 24 hours or so- If you have seen any comments on this site which you found offense, please accept my appologies. I have once again been hit by a determined comment spammer- an order of magnitude worse than anything I have seen before. Over 2000 spams have been posted. I have completely disabled the comment system.

Atomic

Tonight I added an Atom feed (0.3) to the site. The link is in the Subscribe boxlet on the right. I’ve also added a feed validation link for this feed in the Made Possible By boxlet.

Why an Atom feed? I’ve always felt Atom to be a promising project. Why now? I’d always planned to add an Atom feed, but I’ve felt no urgency as the spec is continuing to evolve. However, I’ve got some ideas I want to explore for using the Atom publication format and protocol in a project; more on that when I’ve had some time to refine my thoughts (and maybe write some proof-of-concept code). For now, I’m reading up on the current states of the specs, poking around the wiki, and reading some of the mailing list archives.

Adding the feed to Blosxom was straightforward. I installed the atomfeed plugin, which got me 95% of the way there. Since the last update of the atomfeed plugin, the spec has added <atom:modified> as a required child element of <atom:feed>. I was able to implement this by modifying the built-in flavour and using the lastmodified plugin to supply the value. Putting my name in atomfeed’s $default_author config variable was the only other step I had to complete to get the feed to validate. I’ve posted my version of the atomfeed plugin which contains my fix for the <atom:modified> issue, but does require the lastmodified plugin.

Moved

The website hosting move has been a success. I’ve had a total of 8 hits on the old host today, so I guess the address propagation is nearly complete. Everything seems to be working fine at the new host. Spent a little time this morning setting up AWStats on the new host. That’s the package my old host offered, and I’m used to it. Eventually, I’d like to put together something of my own to do more specific analysis, but that’s a ways down my list.

My new hosting provider is Dreamhost. My old host was fine, but I really wanted shell access, which they didn’t offer. The plan I’m on now lets me host up to 15 domains, of which I’m using two. By paying for 2 years in advance, I’m paying the same as I was previously for two separate single-domain packages. If I ever need another domain (or 13), I’m all set. Sure am loving the shell access.

I’ll make a final check of my pop3 account at the old host tommorow evening, and Monday I’ll call and close the old hosting account. Prepaid a year in August, so I’ve even got a refund coming.

Moving Day

Today I’m initiating the DNS change to point the domain jclark.org at my new hosting provider. Apologies in advance if anything goes awry.

If this is the last sentence you see, you’re viewing the old host.

And if you’re viewing this sentence, you’re viewing the new host. Woohoo!

Update: The move has been remarkably painless. Mail seems to be working, site’s working fine. Bloglines sees the new site, as do my home and work broadband connections.

Also, I’ve re-enabled comments. I haven’t got the blacklist in place yet; I want to try to build in a performance enhancement first, and I need to setup an extra Perl module to do that. More on that later.

Update 2: Big thanks to Dugh for the heads up that my comments weren’t working. All fixed now.

Mail Issues

I’m in the process of switching hosting providers (more on that in a later post). If your mailhost is at Dreamhost (my new provider), you may not be able to send me email until I get everything sorted. In the interim, if you send me mail and it bounces, you can contact me at jason.clark {at} comcast.net.