Goodbye, Yahoo

For someone who has been in the business as long as Yahoo, they sure have a stupid bot. The Yahoo (neé Inktomi) Slurp bot often requests completely strange URIs constructed from various things on my pages. Take this access log entry, for example (split onto multiple lines for clarity): - - [27/Jan/2005:19:29:58 -0800] 
  "GET /weblog/Miscellany/About/xxx@xxxxxxxx/CoolStuff/Potpourri/Spanish/
        Programming/Microsoftish/ HTTP/1.0"
  200 5798 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;"  

Yes, the URI contained an email address in that rediculous chain of categories (which wasn’t mine, so I elided it). In the bot’s defense, Blosxom will take the URI in stride, and return a page with no entries. There are plugins to modify this, which I may look into. Also, my comment system has a bug that can result in bad links when a bad URI (or no URI) is supplied. However, I don’t think that URI is naturally occuring on my blog. Even if it is, I don’t see any other bots asking for things like this.

Speaking of other bots, so far this month Yahoo Slurp has sucked down over 4 times the bandwidth that Googlebot has used. During this period, Google has delivered over 2600 hits, while Yahoo has delivered 107 (for those keeping score at home, MSN has delivered 23 hits).

So, 400% of the overhead, 4% of the return. Fortunately, I’ve found a way to tweak these figures, by adding a new entry to my robots.txt:

    User-agent: Slurp
    Disallow: /

Toodle-oo, Yahoo.

Both comments and pings are currently closed.

2 Responses to “Goodbye, Yahoo”

  1. Mike P. Says:

    I’ve seen similar long wacky urls being followed by slurp on some sites as well; I was ~always able to track them back to some wayward link.

    I wonder if slurp is out to lunch, or just more thorough (your link above does return a 200…). Not so important, I suppose, given those bandwidth/visit numbers!

  2. Benjamin Strackany Says:

    Had the same problem with Yahoo, although in my case I actually get more traffic from them than I do from Google. MSN is still third, though, ofc. ;/