Archive for January, 2005

Goodbye, Yahoo

For someone who has been in the business as long as Yahoo, they sure have a stupid bot. The Yahoo (neƩ Inktomi) Slurp bot often requests completely strange URIs constructed from various things on my pages. Take this access log entry, for example (split onto multiple lines for clarity):

68.142.249.168 - - [27/Jan/2005:19:29:58 -0800] 
  "GET /weblog/Miscellany/About/xxx@xxxxxxxx/CoolStuff/Potpourri/Spanish/
        Hardware/Toys/Miscellany/WebDev/Browsers/Programming/Rants/Apple/
        Music/XML/XSLT/XML/XSLT/Spanish/WebDev/Blogging/Hardware/Apple/
        Programming/Microsoftish/ HTTP/1.0"
  200 5798 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; 
        http://help.yahoo.com/help/us/ysearch/slurp)"  

Yes, the URI contained an email address in that rediculous chain of categories (which wasn’t mine, so I elided it). In the bot’s defense, Blosxom will take the URI in stride, and return a page with no entries. There are plugins to modify this, which I may look into. Also, my comment system has a bug that can result in bad links when a bad URI (or no URI) is supplied. However, I don’t think that URI is naturally occuring on my blog. Even if it is, I don’t see any other bots asking for things like this.

Speaking of other bots, so far this month Yahoo Slurp has sucked down over 4 times the bandwidth that Googlebot has used. During this period, Google has delivered over 2600 hits, while Yahoo has delivered 107 (for those keeping score at home, MSN has delivered 23 hits).

So, 400% of the overhead, 4% of the return. Fortunately, I’ve found a way to tweak these figures, by adding a new entry to my robots.txt:

    User-agent: Slurp
    Disallow: /

Toodle-oo, Yahoo.

More Power!

To quote Tim Allen, “Auugh Auugh Aughh!” The new memory for the iMac arrived today, in all its 1-gigabit-goodness.

About this Mac screenshotg

The whole NewEgg experience was a good one; I will shop with them again. And I’m glad I went with the gig… the system definately feels faster.

Update: Bloglines users should now correctly see the image above. Oops.

More Memory for the iMac

I’ve been putting off ordering additional memory for my iMac G5 ever sine I got it. The configuration I bought at the Apple store included 512M installed as a single DIMM, and one empty slot. The iMac G5 will support up to 2G of memory. I new when I bought I planned to add memory. Given what Apple charges for memory, I knew it wouldn’t be purchased from them.

Based on my experience adding memory to my Powerbook, I inteded to purchase my iMac memory from Crucial. Crucial is a division of Micron, and many sources state that Crucial supplies Apple with all of its memory. Indeed, the memory I purchased a couple years ago for the Powerbook seemed identical to the memory already in the machine, except for the part number (the memory sizes were different). The website has a memory selector to ensure you buy the right unit, the price was very good, and overall I was satistfied with the entire experience.

Shortly after I purchased the iMac, I looked on Crucial’s site for memory. The prices have not changed in that time (about a month), and are currently:

  • 256MB – USD 42.99
  • 512MB – USD 79.99
  • 1GB – USD 259.99

Wow! Quite a jump in price from the half gig unit to the full gig. Given that the iMac only holds two DIMMs and that any future memory upgrade will mean removing an existing chip, I’m highly inclined to buy the 1GB module. However, I really balk at the markup.

So today I’ve hit the web, looking for reputable providers of Mac memory with better prices. I’ve learned quite a bit. For example, using a single DIMM or a mixed pair of DIMMs results in your memory bus operating at half the full speed- a matched pair is faster (Apple Technote). However, I’ve also learned that in real-world tests, the net speed improvement is 0%).

I also learned about a web resource I didn’t know about- ResellerRatings.com. This site allows users to rate online shopping experiences, both with numerical ratings and with written reviews. As someone who always checks Amazon’s reviews when shopping for new goodies, I think I’ll be using this site quite a bit in the future.

So did I find a good deal? I’ve looked at a bunch of websites, including Smalldog, Transintl, Otherworld Computing, MacGurus, and MacSolutions. The best price so far (for a 1GB DIMM) has been 110. In addition, NewEgg has an excellent rating at ResellerRatings, based on nearly 10,000 reviews. The memory is Patriot Memory from PDP Systems. I’m not familiar with the memory, but the NewEgg site has a number of reviews for the product from iMac G5 owners. So, I ordered one. More to follow when it arrives.

  1. 50 USD at NewEgg

No Follow

Unless you’ve been living under a rock, you’ve probably seen the big Google Announcement, entitled “Preventing comment spam”. By adding a rel='nofollow'attribute to a link, you instruct Google (and Yahoo, and MSN Search) not to consider the link in things like PageRank calculations. Blog software producers have jumped on the bandwagon, promising to use this attribute in all links within comments, referer lists, etc.- anywhere a website visitor can create a link. The idea is that this removes the incentive for comment spam: nofollow = No PageRank.

Will it work? Ben Hammersley provides a lucid explanation of the economics of spam (it’s free!) and concludes that nofollow may increase the amount of spam. I tend to agree that spammers will pay it little mind; I think the real value (and Google’s real purpose) is that it may help improve search engine results. Remember when finding things via Google was easy, and you never saw links to commerce sites?

Robert Scoble notes this has another use – it allows a blogger to link directly to someone without giving Google juice. How often have you seen someone complain in a blog post about a spammer, without linking directly, for just this reason? Phil Ringnalda points out this can be used to selectively control the PageRank you bestow. Phil has added special style rules to his user style sheet (userContent.css in Firefox) to display all nofollow links in flashing lime green, which ensures he knows when a page he is viewing is fiddling with PageRank. I liked the idea so much I copied his style rule into my user stylesheet, so far it’s been enlightening to see who is using it, and for what links.

For now, I’m taking no action on this site – I already treat any comment containing a raw html tag as spam.

Fighting Referer Spam with deferer

I’ve been getting swamped by referer spam lately. Most of it is for domains that appear to have had hosting suspended. I had already seen a correlation between this referer spam and comment spam; what I didn’t know (but should have guessed) is that lots of folks are seing this. Tim Bray wrote about the problem today, pointing to more info from John Sinteur and Ann Elisabeth. Apparently all of these referer URIs resolve to a single webhost, with an IP of 161.58.59.8.

Ann’s post is one of many on her blog about the subject, she is actively pursuing this and trying to get Verio to pull the miscreant’s hosting. I suggest reading everything on her homepage for lots of good info.

John’s post on the WordPress support blog includes PHP code that sends a 301 Moved Permanently redirection header to any request with a referer URI that resolves to the IP above. Where does he redirect them? Back to the referer URI, of course.

Now, that’s an idea I like. I liked it so much I wrote a perl version as a Blosxom plugin. It’s called deferer, and is available here. Right now, it’s a one-trick pony, but (when time permits) I intend to expand it to a more comprehensive referer spam solution. I don’t know how effective redirecting these requests is- the spider scripts sending the requests may not follow redirects. Even so, deferer reduces server load (it ends the blosxom invocation early) and saves bandwidth.

Update: Deferer has been updated to version 0+2i, to fix a bug that caused a 500 Server Error if the referer hostname could not be resolved.