Archive for the 'ThisSite' Category

Note: I've reorganized this site to use tags; the category archive remains to support old links. Only posts prior to April, 2006 are categorized. Tag Archive »

This is My Final Post

…to the Blosxom-based blog I’ve run at this location for three years. In fact, my three year anniversary passed silently four days ago.

However, I’m not done blogging yet. As great as Blosxom is, I’m ready for something with less friction. I’ve decided to migrate this site to WordPress.

So what’s in store?

  • Existing posts will remain
  • Existing comments will remain
  • Comments will be reactivated (!)
  • I’ll publish my Blosxom-to-Wordpress conversion method/code
  • Date-based permalinks
  • Existing permalinks will work
  • The categories will remain as archives, but new posts will be use tags
  • I hope to get the old posts tagged as well
  • A brand new layout (finally!)
  • A valid Atom 1.0 feed (I had to hack this in, I’ll publish my code after the conversion)
  • Hopefully, I’ll start posting again!

The timeline’s not set, but I’m hoping to have everything done this weekend. There’s still one major technical issue to overcome, but I’ve asked for help in the WP support forum, and I’m poking around in the code. Stay tuned.

Outage

Sorry for the outage in my posting here; been far too busy away from the computer for the past month+. Tonight I cliked the bookmark for my blog by mistake, only to find the site suffering a real outage. A check of the log indicates that Markdown was dying. I think my host upgraded perl versions in the last few days (although I haven’t confirmed); the error was in code that had been working for months (at least).

Since I was still running a beta of the 1.0 release of Markdown, I didn’t spend much time trying to fix the issue, I just upgraded to the current markdown release (1.0.1) and the problem is corrected. I’ve only spot checked a few pages, so if you see anything not working or looking odd, please shoot me an email.

Goodbye, Yahoo

For someone who has been in the business as long as Yahoo, they sure have a stupid bot. The Yahoo (neƩ Inktomi) Slurp bot often requests completely strange URIs constructed from various things on my pages. Take this access log entry, for example (split onto multiple lines for clarity):

68.142.249.168 - - [27/Jan/2005:19:29:58 -0800] 
  "GET /weblog/Miscellany/About/xxx@xxxxxxxx/CoolStuff/Potpourri/Spanish/
        Hardware/Toys/Miscellany/WebDev/Browsers/Programming/Rants/Apple/
        Music/XML/XSLT/XML/XSLT/Spanish/WebDev/Blogging/Hardware/Apple/
        Programming/Microsoftish/ HTTP/1.0"
  200 5798 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; 
        http://help.yahoo.com/help/us/ysearch/slurp)"  

Yes, the URI contained an email address in that rediculous chain of categories (which wasn’t mine, so I elided it). In the bot’s defense, Blosxom will take the URI in stride, and return a page with no entries. There are plugins to modify this, which I may look into. Also, my comment system has a bug that can result in bad links when a bad URI (or no URI) is supplied. However, I don’t think that URI is naturally occuring on my blog. Even if it is, I don’t see any other bots asking for things like this.

Speaking of other bots, so far this month Yahoo Slurp has sucked down over 4 times the bandwidth that Googlebot has used. During this period, Google has delivered over 2600 hits, while Yahoo has delivered 107 (for those keeping score at home, MSN has delivered 23 hits).

So, 400% of the overhead, 4% of the return. Fortunately, I’ve found a way to tweak these figures, by adding a new entry to my robots.txt:

    User-agent: Slurp
    Disallow: /

Toodle-oo, Yahoo.

Site Linking Policy

All of the content on jclark.org is Creative Commons licenced – you may use it as you like as long as you give attribution and share your changes. This has been my policy almost since the beginning of the site.

Linking to the site is also encouraged; I make every effort to ensure my permalinks always work. However, linking directly to images is prohibited. If you’d like to use one of my images on your site, please make a copy and host it yourself. Someone has begun linking directly to one of my badges, which I consider a gross abuse of my bandwidth.

I have searched all over the offending website, and cannot find an email address for the site owner. Since I’ve never posted a linking policy, I would like to contact him and allow him to change his site before I take steps to prevent this. Therefore I’m posting this now, to allow him time to correct the problem. By this weekend I will be actively blocking this and possibly serving alternate content instead.

Googlebot, Update These

This post is for Googlebot. Go follow these links, and index them:

Thanks pal.

For the rest of you, who are probably wondering if I’ve hit my head (not that I remember), I’ll explain. While checking my referers briefly (whole other post on that topic forthcoming), I noticed what looked like a search engine hit for a topic I don’t normally post about, of an adult, or more likely a teenage male, nature. (I’m not a prude, but listing the search terms would defeat my purpose). A quick check showed that the site was a non-English search engine powered by Google.

It seems that Googlebot stopped by the day I was hit by over 2000 comment spams. Although I took the entire comment system offline to remove that crap as soon as I saw it, Googlebot must have indexed a few of the pages. Those pages linked above are the pages it indexed that day and has apparently not reindexed since. I like website traffic as much as the next blogger, but I’m really not in the market for search engine hits for animated attacks on women. I certainly don’t want to be the number #6 Google hit for that search (which I am, for the moment). The sooner Googlebot indexes the clean versions of the pages above, the better.

Also of interest, Googlebot found those pages via www.jclark.org, instead of jclark.org. They are the same, but I omit the www as it is unnecessary. Other people who link to me occaisionally use the www, which is why it is indexed both ways. Of course, it shouldn’t be indexed twice, so I’ll be adding a mod_rewrite rule to my .htaccess file to permanently redirect all www. URIs to their sub-domain-free equivalents. Just to be safe, however, I think I’ll wait until Googlebot reindexes the above links.