Indexing Tweak Redux

Dugh mentioned in the comments of my post on using the “robots” meta tag with Blosxom that he was having some trouble with interpolate_fancy getting everything working, and asked for my template. Since my comments system is still nearly 100% feature-free, I decided to just create a new post.

The goal is to help search engine robots to only index permalinks (individual posts) but not pages containing multiple posts such as data or category archives (or your main page, since the content changes). To do this, we need to add a <meta /> tag to the <head> of each page. For index pages, we need the following:

and for individual posts, we need this:

My solution requires two plugins for Blosxom: Rael’s interpolate_fancy plugin and my own storystate plugin. A word of warning - if you aren’t already using interpolate_fancy, this isn’t a simple drop-in - you’ll have to change all of your templates. For more info, see the interpolate_fancy docs. My storystate plugin simply provides a number of additional variables denoting the state of the current story, for use by interpolate_fancy’s conditional tags. For this application, we need $storystate::permalink, which is true if the current page represents a single post, and undef otherwise. Here’s the relevant section of my head.html flavour template:

That’s all I did. I haven’t seen much change on Google yet, but it’s only been in place for 3 days. Hopefully, as Google reindexes more of my site, all of my index pages will drop off, leaving only permalinks.

Update: The change shown here has a nasty side effect of no longer indexing the blog’s homepage, see Tweaking the Robot Tweak for an improved version that fixes this bug.

You can leave a response, or trackback from your own site.

4 Responses to “Indexing Tweak Redux”

  1. dugh Says:

    brilliant!

    thanks, jason! i’ve just implemented this on three of the sites i’m running and it works like a charm. i’ve also made a post on the (ill-named?) BUG (blosxom user group) that hopefully will take off someday…

  2. Lou Quillio Says:

    Follow-up on the effects very closely. About a year and a half ago I changed the robots META tags on my personal site in this exact way for the same reason. I have a unique name, thus a unique domain name: pops right to the top of Google queries, because the only other occurrences are a town in Brittany and some woman with porcupine photos. Stuff I’ve posted elsewhere makes up the rest. Six or so months later I Googled my name. My personal site had all but vanished. It’s not supposed to work that way, but it did. Hey, Google PR isn’t a static undertaking; maybe Google handles robots META directives better now. It’d be wise to follow-up once you’re re-indexed, though, then again in a month or two.

  3. Jason Says:

    Google results

    Lou-

    I’ve been checking Google since I made the changes, using “site:jclark.org blosxom” or similar queries. At the same time I made that change, I also changed my template to include the post name in the page title on individual posts (permalinks). This has made it easy to track Google’s progress in reindexing my site. It appears that they index me a little each day; more and more of my posts have the posts titles on Google each day. I’m still seeing index pages in the search results, but I’m not sure if that’s becoming less common or not. I’ll keep watching it.

    One thing that has been bothering me though is that my blog homepage () is now marked as “noindex”. I think I’m going to change this; since I normally use this as my URL when I post on other sites instead of my site homepage (), I do want that page indexed. I just tried a Google search for “jason clark” and “jclark”, both of which used to return my blog homepage in the first page of results; now neither search does.

  4. Lou Quillio Says:

    Swarm

    Jason,

    Right, as of about a year ago the Googlebot doesn’t swarm sites, as it used to do approximately monthly. The Google-obsessed at WebmasterWorld.com fairly freaked (there’s a Google employee who posts there, who’s collected acolytes). But the Googlebot still gets to you a bit at a time.

    From what you’ve observed, the problem I had seems to be fixed. “noindex, follow” on the entry page previously had the effect of “noindex, nofollow,” at least for me. So I went back to “index, follow” on both home and individual pages. In my case, that entry page was the domain root and, like you, I didn’t want folks following indexed links to that entry page when the item they wanted may have rolled off the bottom.

    But, you know, I’m happy with the way these directives are working now (”index, follow” everywhere), and I think there are two causes. First, the Googlebot’s visits seem more tuned to the idea that my content isn’t static. Second, I added ‘rel=”bookmark”‘ to the item permalinks. My sense is that the latter made all the difference.

    LQ

Leave a Reply