Archive for December, 2004

Taking out the Trash

Regular readers are aware that I disabled comments on this site about a week ago, after receiving over 2000 spam comments in just a few hours. Alert readers may have also noticed that all comments have been missing since; all postings have showed 0 comments regardless of prior comments. This is because when I disabled comments, I not only wanted to prevent further postings but also to prevent the display of the 2000+ morsels of nastiness. The quickest way to do both given my Blosxom setup was to simply remove the writeback plugin.

I’ve made adjustments to my .htaccess file what will prevent the same type of posting in the future, but I expect the spammers to adapt. The delay in re-enabling comments was the need to clean out the spam. Because I use Blosxom and writeback, my comments are store in the filesystem, in a dir tree that matches the site layout, one file per blog post (i.e., all comments for a given post are in one file). With over 2000 bad comments spread across 224 different files, I wasn’t about to clean things up by hand. Instead, I wrote a perl script to help me do it.

Even though it’s used alot in Blosxom and various plugins, I’ve never had a firm grasp of perl’s File::Find module, so instead I decided to use File::Find::Rule instead. (For a nice explanation of this module, see File::Find::Rule in the 2002 Perl Advent Calendar.) The only problem is that the module (and several prerequisites) were not installed on my webserver. Having shell access, I was able to install local copies of the needed modules. Being very busy over the holidays, I only got around to this today.

The script is called scrub, and is available under the GNU General Public Licence. You can download scrub. Please note that scrub is written as a command line utility - you will need shell access on your webserver to use scrub. If there is demand (and if no one else does it first), I may develop a CGI version of scrub to run from a web server.

So how does scrub work? Why, Voodoo magic, of course. In fact, the darkest Voodoo of all… regular expressions. Supply a regex, and scrub can list all comment files containing the regex. It can also display the matching comments, and a count of files matched. Most importantly, it can remove offending comments when run with the -scrub option.

scrub is designed to work with comment files created by the writeback plugin. These files contain each comment, along with the name and url of the poster as supplied on the posting form. I’ve modified my copy of writeback to also log the IP address. Any information in the comment file can be matched by the regex… so if you are logging IPs as I am, you can quickly find (and eliminate) all comments from a given IP. scrub overrides perl’s $/ magic variable, which is the input separator. By setting $/ to "-----\n" (the comment separator in writeback files), scrub can process each comment as a single unit.

Here are a few examples of scrub usage:

  1. List all files containing ’spam.com’, display total:

    scrub -regex 'spam.com' -list -count
    
  2. Remove all comments containing ’spam.com’, show progress via filenames:

    scrub -regex 'spam.com' -list -scrub
    
  3. Show all files containing raw html hyperlinks, and the actual comments:

    scrub -regex '<a href' -list -show
    

Example 3 above brings to light a deficiency with scrub - the regex’s are always case sensitive. If I post a revision, this will be addressed.

If you find this at all useful, please leave me a comment… they are enabled once again.

Merry Christmas

Here on the East Coast of the US, it’s still Christmas for another 6 minutes or so, so I’m not actually late in saying this :)

Merry Christmas to all, and to all a good night.

An Announcement

To anyone who has viewed my site in the past 24 hours or so- If you have seen any comments on this site which you found offense, please accept my appologies. I have once again been hit by a determined comment spammer- an order of magnitude worse than anything I have seen before. Over 2000 spams have been posted. I have completely disabled the comment system.

Return of the Camcorder

Here’s something I’m very past due on posting. Last month, I posted about my dead camcorder. The unit, a Canon ZR65MC has returned from Waranty-service land, and all is well. I’m impressed with Canon’s service.

I called the warranty service 800 number on Monday, Nov 22. The representative was very appologetic and helpful. The camera was shipped out the same day, to New Jersey (very fortunate, as I’m just across the Delaware River from Jersey). They returned the camera via FedEx, who tried to deliver it December 9. We missed the FedEx guy a couple of times, but finally received the camera last week. The note in the box confirmed that the CCD was replaced, and that the camcorder was also “cleaned and adjusted.” Very swift turnaround, and I’m quite pleased to have it back in time for Christmas (and in time for a pre-christmas video-editing project).

While reseearching the problem initially, I read a number reports on the web about others having the same problem (dead CCD) right before or right after the end of the warranty period. Sherri sent a letter to Canon, suggesting a recall might be in order. The day after we received the camcorder, I received a call from Canon. Sherri wasn’t home so I took the call. I couldn’t recall the details of the letter, so I didn’t ask about a recall. The woman from Canon asked if we had received the camcorder and if everything was okay. She also gave me her direct number, and asked us to call if we had any further problems with the camera at any point in the future.

This is a great example of customer service done right. After the camera first stopped working, I was ready to swear off Canon products- which is a shame, considering that I also own a Canon i960 Photo Inkjet which I love. Two excellent “personal” contacts (my initial call and their followup), plus the rapid turnaround and FedEx shipping of the repaired camera during the holidays, all adds up to a satisfied customer who will be both a repeat customer and a great word-of-mouth customer.

Howto use a Microsoft Intellimouse with OS X

While I’m not a fan of their operating systems, I definately like Microsoft’s mice. I’m very partial to the Intellimouse Optical, the beige and silver optical which is omni-handed. I’m not a lefty, I just find every “contoured” mouse I try to be wickedly uncomfortable. I use an Intellimouse on my home and work PCs. In particular, I’m extremeley dependant on the scroll wheel, scroll wheel button (to open a new tab in Firefox), and the thumb button (aka “Button 4″), which acts as a back button in Firefox (and in IE).

Tonight, I plugged my Intellimouse into my Mac, and was surprised to find that it did not work as expected. The left and right buttons worked fine, as did the scroll wheel. However, middle-clicking worked in Safari but not in Firefox. The thumb button did not work at all. Also, the scroll wheel seemed to accelerate as I used it, which I found to be very disorienting. I could find no answer to these problems in System Preferences, or Firefox’s preference panel.

After Googling a bit, it turns out that Microsoft provides Intellipoint software for OS X. I didn’t really think about needing additional software since I already knew that the Mac would handle right-clicking and scroll wheels. As loathe as I am to install Microsoft software on a Mac, I decided to give it a spin; I’ve had largely good experiences with Intellipoint on Windows, and I really needed to get a mouse set up that works like I expect.

Success. I downloaded and installed Intellipoint 5.1 for OS X*, which of course required a reboot (insert Windows joke here). The installation added a new item to System Preferences, which launches the Intellipoint software. Having this installed seems to fix the Back (thumb) and Forward (5th) button automagically. Also, the scrolling settings have an “Enable Accelerated Scrolling” option which was off by default; I no longer see the accelerated scrolling.

Fixing my middle click in Firefox was one extra step. Using the Intellimouse software, I had to change the meaning of middle click from the default “Next Application” to “Click”. Once “Click” is chosen, checkboxes allow you to select modifiers. I set up middle click to equal Cmd+Click, and all is well with the world.

Kudos to Ste Grainer, whose comment on Jeremey Zawodney’s blog outlined the middle click fix, and made me realized I must need some extra software.

P.S. - I’m behind on my blogging, but hey… it’s the holidays. I promise to try and catch up a little; I do have a few good things I need to blog.

*Note: I make no promises about the link above, Microsoft isn’t known for having Cool URIs. If the link doesn’t pan out, just poke around the Microsoft site like I did. One suggestion… I found the drivers on the product page, not in the Support section.

Update: As much as I hate closing comments on a post, this one is just a spam magnet. If you have anything to contribute, drop me a line.

Take a Hint, AllResearch - Go Away

It was a couple of months ago when I had to ban a bot for the first time. At the time, I noticed one IP address (My Most Frequent Visitor) outshining all others in my web stats. A little research showed MMFV to be an outfit called AllResearch.com, who appears to specialize in hoovering down other sites in order to provide such services as trademark tracking, webclipping, and “law enforcement” services.

As bad as that sounds, the only reason I took even took note was the bandwidth comsumption. Every 60 minutes, they hit my RSS feed, and then pulled down every item listed in the feed. What a horrible perversion of intent. At the time, I banned the IP address, thusly (in .htaccess):

RewriteCond %{REMOTE_ADDR} “^38.144.36.16$” RewriteRule .* - [F,L]

I then watched, ammused, for several days as the 403 errors stacked up, once an hour, from their IP address. Worst. Bot. Ever.

Well, I got around to checking my stats again today, and what do you know? My Most Frequent Visitor just can’t take a hint. He’s back, using IP address 38.144.36.19. And I didn’t find him due to vigilance; he’s just stupid. The exact same usage pattern. While expanding my ban, I even corrected the unescaped dots in my original version:

RewriteCond %{REMOTE_ADDR} “^38.144.36.” RewriteRule .* - [F,L]

I’m not the only one seeing this. If you run a site, take a minute and check your stats or your logs for addresses beginning with 38.144.36.. If you see abuse like I’ve seen, take a minute and ban them. Maybe they will eventually take a hint.

Hackers and Painters

Via Tim Bray, via Tim Bray, via, well, you know: Go and read Hackers and Painters, an essay by Paul Graham. This is a must-read for anyone who considers themselves a hacker, especially those like me who make a living at it. A few choice quotes:

I tended to just spew out code that was hopelessly broken, and gradually beat it into shape. Debugging, I was taught, was a kind of final pass where you caught typos and oversights. The way I worked, it seemed like programming consisted of debugging.

For a long time I felt bad about this, just as I once felt bad that I didn’t hold my pencil the way they taught me to in elementary school. If I had only looked over at the other makers, the painters or the architects, I would have realized that there was a name for what I was doing: sketching. As far as I can tell, the way they taught me to program in college was all wrong. You should figure out programs as you’re writing them, just as writers and painters and architects do.

And also:

In hacking, like painting, work comes in cycles. Sometimes you get excited about some new project and you want to work sixteen hours a day on it. Other times nothing seems interesting.

Amen, Brother.

Firefox Printing

Earlier today I wanted to print a web page for some offline reading. It was a page from a blog, featuring a left-handed nav/info bar and content on the right. Of course, I only wanted to print the content area. I tried printing, hoping for a print stylesheet that would supress the navbar. No such luck (note to self: implement a print stylesheet for your own glass house, and put down that stone).

I then tried turning off styles, figuring I could omit the pages of the printout which contained the navbar. I was surprised to find that Firefox prints using the default style sheet when styles are turned off (View|Page Style|No Style) (note to self: check Bugzilla, report if needed).

I considered adding a user stylesheet, hoping that it would cascade with the existing stylesheet. In trying to figure out what my stylesheet should do, I used the DOM inspector. I was disappointed to find the layout used a 1 row <table> with with 2 <td>s to achieve layout (note to self: you don’t do that- good job). No wories, judicious use of a following sibling CSS selector did the job:

td {display:none} /* don't show <td>s */
td+td {display:block} /* okay, show <td>s that follow another td */

The net affect is that only the first <td> (in each <tr>) is hidden. Perfect for this application. I tested this CSS snippet using the Edit CSS feature of fantastic the Web Developer Toolbar. On a whim, I tried printing, and was pleased to see that Firefox printed using the edited CSS.

Update: I dutifully checked Bugzilla. Bug 260762 looks like a match.

The Lazy Web

Whenever I learn something very cool that’s been around for a while, I wonder if I’m the last one to the party. I’ve seen mention of the lazyweb many times… the idea is that somewhere, someone has probably already solved your problem (whatever it may be), or would like to. To invoke the lazyweb, you post a problem or question to your blog, often mentioning “Lazyweb” in the post, and sit back and wait for enlightment. You must need decent blog visibility in order to get results from the lazyweb. I’ve never really tried it, since I don’t think my readership levels are very high (and I don’t want to prove it to myself).

At least, this is what I always thought. It turns out that someone has built a tool (someone always does, that’s the point of the lazyweb). After posting your lazyweb request, you can send a trackback ping to lazyweb.org (or if your blogging software is configured for auto-trackback, just link to it). Lazyweb.org displays the trackbacks, allowing interested (and helpful) folks to see and respond. Naturally, there’s an RSS feed as well (as seen in the Techish section of my blogroll).

Considering the bloggeratti who have worked on the site, I can’t believe I just learned about this. Makes me wonder what other gems I’m missing.

Site Browsing Stats, November 2004

In honor of the November official launch of Firefox 1.0, and because I happened to be looking at my site stats tonight, here’s the breakdown of the browser usage for this site, as reported by my stats package, awstats. Unknown is anything the package has never heard of, whereas Others are known browsers, too small to list separately. This includes things like wget and curl, and even 57 hits from a pre-Firefox copy of Firebird. :

Browser              Hits      %age
-------              ----    ------
FireFox             34868    37.9 %
Internet Explorer   28008    30.4 %
Safari              10536    11.4 %
Unknown              6080     6.6 %
Mozilla              4103     4.4 %
NetNewsWire          3253     3.5 %
Opera                1744     1.8 %
Netscape             1525     1.6 %
Camino                518     0.5 %
Konqueror             362     0.3 %
Others                861     0.9 %

I’m very proud of my readers for using Firefox far more than Internet Explorer. As to the 30.4%, and you know who you are, go on. Try it. All the cool kids are doing it.