Today, I spent some time staring at an old piece of code that I had written at least a year ago. It’s been in testing several times, but never put into production (the project it is tied to has been bumped on several occaisions). Today, it was back in testing.
The code is a failry simple web service, written in Perl. I like Perl. I have no illusions that I’m a fantastic Perl hacker, but I know the language well, though both experience and reading. I’ve read most of the O’Reilly Perl titles, including Programming Perl (”The Camel”), which I’ve read cover to cover at least three times. I still find myself looking things up, usually to refresh my memory about something I can remember reading, or some syntax detail I can’t get right (one of the perils of working in multiple languages). At least I generally know where to look.
So this web service has been tested before. It works in a browser, and it works when called by my test client. It’s been tested with a third-party bit of code. Today, it was tested by Dave, using some custom client code he had written in C#. And it worked… if he told his http library to ignore HTTP protocol errors. If he didn’t, his library complained.
And so I stared at the code for a while. By coincidence, I’d been reading the HTTP Spec over the weekend (yes, I’m a geek), and was pretty sure my response was good- a bare-minumum response, along the lines of:
HTTP/1.1 200 OK
Content-Type: text/plain; charset=utf-8
Single Line Response
I double checked the spec anyway, and kept staring at the code. I was about to start grasping at straws and adding additional entity headers to the response (such as Content-Length), when I finally stared at the code long enough. I saw something like this:
print "HTTP/1.1 $status\n"
print "Content-Type: text/plain; charset=utf-8\n"
Then it hit me- "\n" in perl is a “magic newline”- it conforms to the newline convention on the system in question. HTTP, on the other hand, requires ASCII CR+LF (Cariage Return + Line Feed, or "\r\n" in C) as a line terminator. Apparently all of the code thrown at the service before today was a bit forgiving. I changed the strings to send CRLF using octal escape sequences ("\015\012"), and everything was fine. I was a bit ticked about the mistake… I new both the HTTP requirement for CRLF and the Perl treatment of "\n" when I originally wrote the code; it was a dumb mistake. Also aggravating that it took so long to spot.
And there my tale should end. But this evening, I started wondering if the octal sequence was the most Perlish way to send a CRLF. I knew that "\r" is fine for the CR, but you can’t use "\n" for the LF - it’s magic in perl, and behaves differently on different platforms. I began to wonder if Perl has a backslash-escape for LF that is always LF. Eventually, I had to check for myself, so I referred to the Quote and Quote-like Operators section of the perlop man page. (Sadly, I knew right where to look, right down to the name of the section. Geek, remember?)
Turns out the manpage specifically recommends the octal form for networking applications (at least I got that right), but then it twists the knife:
If you get in the habit of using "\n" for networking, you may be burned some day.
D’oh.
I’ve been playing with Python alot lately in my spare time, but I still use mostly Perl at work. One of the handy things about Python is the interactive mode; I like it so much I even cloned it for Perl some time ago. Even without my Perlthon script, you can get a quick approximation in perl using the perl debugger and a command-line script:
perl -de1
(That’s a one, not an el.) The above will invoke perl with the debugger (-d), debugging a very simple script (-e1, which is to say -e '1;'). Once the debugger starts, you can just type perl statements, and can use x <expr> to inspect values.
Whichever way you play with interactive Perl, testing regular expressions can be a pain. It’s not too bad under Python, given Python’s use of match objects:
import re
re.search('regex', 'string').group(0)
The second line runs the regex against the string, and prints the entire match. If your regex doesn’t match at all, you get an error, but that’s fine (and self explanatory). If your regex doesn’t perform as expected, repeated attempts make it easy to triangulate. If you have Python’s readline module installed, you can just hit UpArrow after the test, tweak the regex, lather, rinse, repeat. I wanted the same flexibility with interactive Perl; it turns out to be trivial:
# perl debugger version
x ('string' =~ /regex/, $&)[-1]
# perlthon version
(’string’ =~ /regex/, $&)[-1];
The regex match operation will return a list of matched groups, which can be handy at times. For testing a complicated regex, I often just want to see the whole match to be sure I’m getting what I expect. The array notation accomplishes this nicely.
After spending more time than I care to admit working on a problem I actually solved in May, I decided I’d better blog this before I forget again.
Using ActivePerl, it is possible to run perl CGI scripts under Microsoft IIS. In a default installation, the extention to use is .plx (not .cgi), which is mapped to run under PerlIS.dll, the “Perl for ISAPI” implementation. Works pretty well. The issue I had was with HTTP Authentication. If you want to handle your own authentication in a CGI script, you can check the Environment variable HTTP_AUTHORIZATION. For example:
binmode(STDOUT, ":utf8"); #you know you should
my $have_authinfo = (defined($ENV{HTTP_AUTHORIZATION})
and (substr($ENV{HTTP_AUTHORIZATION},0,6) eq 'Basic '));
my ($user, $pass) = ('','');
if ($have_authinfo) {
my $decoded = decode_base64(substr($ENV{HTTP_AUTHORIZATION},6));
if ($decoded =~ /:/) {
($user, $pass) = split(/:/, $decoded);
} else {
$have_authinfo = 0;
}
}
if (!$have_authinfo or !Authorize($user, $pass)) {
print << "EOF";
HTTP/1.1 401 Authorization Required
WWW-Authenticate: Basic realm="Example.com"
Content-Type: text/plain; charset=utf-8
You must supply valid credentials to access this resource
EOF
close STDOUT;
exit 0;
}
#Authorized, continue with web page....
sub Authorize {
my ($user, $pass) = @_;
#Do something to authenticate, return true/false
}
Of course, I’m relying on Basic authentication, which you should only do if your script will only be available via HTTPS (as is mine). The whole thing is dependant on $ENV{HTTP_AUTHORIZATION}, which by default won’t actually get passed to your script under IIS and PerlIS.
Fortunately, fixing this is simple, if a bit non-evident. In the IIS Management Console, navigate to the folder containing your script, and select the script. Right-click and choose properties. On the File Security tab of the properties dialog, click the Edit button under “Anonymous Access”. On the next dialog, make sure that “Annonymous Access” is checked and that no other authentication method is checked. By default, Windows Integrated Authentication is selected, which makes IIS snoop around the header, and apparently lose it.
For multiple scripts, put them all in one location (go on, call it cgi-bin), and make the same changes above to the whole folder. New scripts created in this folder should inherit the settings.
I’ve been writing alot more Perl at work lately, which suits me fine. With my Perl still a bit rusty, however, I found myself dashing off lots of -e one-liners to test various bits. After a while, I wanted a quicker way to test things…an interactive mode such as Visual Basic’s immediate window or Python’s interactive mode.
Perl being perl, this sort of thing is crazy easy. As most perl programmers know, or can quickly figure out, the fastest way to get an interactive perl session is:
perl -ne eval;
which will eval every line of input until input ends (CTRL+D on *nix systems) or you type exit (as exit is a perl builtin which does just that). This command has no explicit output, you’ll need to include your own print calls to see output. Still… quick, dirty, and handy. A slight tweak if you’ll be printing most everything you feed it:
perl -ne ‘print eval; print “\n”‘
This autoprints everything. Note the use of two print statements; this is on purpose. If instead we use print eval() . "\n", the eval will be called in scalar context. Try printing localtime, if you see a nicely formated date, scalar context is to blame. If you want such a thing, just print scalar localtime. Best of both worlds. The other choice for combining print statements is print eval, "\n". This calls eval in list context, however, it also passes "\n" to print as part of the list. This means that using:
$,=”,”; localtime;
to print the localtime array with commas will add a comma to the end of the line as well (before the "\n").
Never content to leave well enough alone, however, I wanted more. Command history and in-line editing. Multi-line command entry (a la Python). A help function. A quit function that doesn’t (a la Python). You know, toys. The result is Perlthon, an interactive Perl session that works like Python’s interactive mode. It’s easier to show than to tell, so here’s a sample session:
$> perlthon
Perlthon running Perl 5.8.0
Type "help;" for help, "exit;" or press CTRL+D to exit
>>> help;
Perlthon, the Interactive Perl Interpeter v0.1
by Jason Clark <jason@jclark.org>
This is Free Software.
Enter commands for perl to evaluate, a la interactive Python.
Lines without a ; are continued on next input.
By default, the result of each evauation is printed. To disable,
use this: "$AUTOPRINT=0;"
Use "exit;" or CTRL+D to exit.
Prompts are controlled by $PROMPT1 and $PROMPT2, which you can change.
>>> localtime;
59371729810432721
>>> $, = ",";
,
>>> localtime;
10,38,17,29,8,104,3,272,1
>>> scalar(
... localtime
... );
Wed Sep 29 17:38:30 2004
>>> quit;
Use "exit;" or CTRL+D to exit.
1
>>> exit;
$>
Bugs: Oh, you betcha. Weird behavior when Term::Readline has to fake it and you’ve changed $,. Also, because Perlthon looks for ; at the end of line to know when it’s time to eval, entering a multiline sub is a pain. You can beat it by ending each line with a comment (#). Of course, this could be considered another bug… semi-coloned lines ending with comments don’t run unless the comment ends with a semicolon. I’d like to have a block mode (if inside {} then ; doesn’t end multiline input), but this is trickier than it looks. Consider:
do {
#comment with a }
foo;
}
But hey, works for now.
This morning, for the second time in as many weeks, I banged my head against a Perl problem I thought should be simple: I wanted to open a filehandle against a scalar (or perform some other chicanery) so that if I have a scalar $text full of text, I could do this:
while (<$fh>) {
#do something with $_
}
Where the filehandle $fh would refer to the contents of the scalar $text. This seems like an obvious thing someone may want to do. While you could just as easily do something like foreach (split /\n/,$text) { #... }, I had a situation where I might have data in a scalar or in a file (or even STDIN) and I wanted to treat them all the same. I expected this would be covered in the Camel and/or the Cookbook, but I couldn’t find any such thing. I didn’t have much luck on the web either. In the end, I took the low road and faked it with a system call and a pipe, since I the script I was working on is infrequently used. Here’s that version:
open(FH, qq[echo "$text" |]) or die “Can’t pipe.”;
while (<fh>) { #… }
When the same problem came up this morning in a web service I’m working on, I decided to try researching it again, since I really didn’t like the pipe solution. After alot of digging, I came across the perl module IO::Scalar, which lets you do exactly what I wanted. The IO:Scalar version of the code looks like this:
use IO::Scalar;
my $fh = new IO::Scalar \$text;
while (<$fh>) { #... }
My test app worked nicely on the Unix development server, but then I realized I still had a problem. For some period of time, this webservice will be running on a Windows2000 web server (don’t ask). IO::Scalar is not part of the standard Perl distro. I’m using ActiveState’s ActivePerl, which makes installing Modules via perl -MCPAN -e shell, well, challenging. ActiveState has a nice Perl Package Manager; unfortunately I could find no ready-made package for IO::Scalar, so I was stuck. While stumbling around the ActiveState site looking for inspiration, I found PerlIO::scalar. Now I was on to something.
(A moment for a side note here. I would write far fewer lines of Perl code per hour if not for the fantastic Perldoc.com maintained by Carlos Ramirez. It is an absolutely indispensible resource for me. However, it’s a bit buggy. I can’t get it to let me search the perl 5.8.0 docs, and the 5.8.4 docs appear incomplete (missing standard modules). After finding the info on ActiveState’s site, I figured out how to get to it on Perldoc.com.)
According the docs for PerlIO::scalar:
PerlIO::scalar only exists to use XSLoader to load C code that provides support for treating a scalar as an “in memory” file.
The docs on the ActiveState site also note that it’s not necessary to use PerlIO::Scalar. This reduces the code to the following:
open $fh, "<", \$text;
while (<$fh>) { #... }
Excellent. Not only is it concise and easy to use, it’s part of the standard distro. I’ve documented this not only for my own future reference, but to add my bit to the world’s largest help database. I just hope I wasn’t being totally obtuse, to later find that this is Camel page 52 material.
This a bit of pre-emptive blogging, for the next time I forget. Those of you with stronger Perl-fu than I already know these things.
How to check if a module is installed:
No output indicates success :
perl -MMODULENAME -e1
For example:
perl -MHTML::Embperl -e1
How to check the version number of an installed module:
Assumes the module uses $VERSION, but then, most CPAN modules do) :
perl -MMODULENAME -e’print “$MODULENAME::VERSION\n”;’
For example:
perl -MHTML::Embperl -e’print “$HTML::Embperl::VERSION\n”;’
How to determine where a module is installed:
Lists every dir used by the module, including man pages, etc. Should be entered on one line. :
perl -MExtUtils::Installed
-e'$,="\n";print ExtUtils::Installed->new()->directories("MODULENAME")," "'
For example:
perl -MExtUtils::Installed
-e'$,="\n";print ExtUtils::Installed->new()->directories("HTML::Embperl")," "'
This technique relies on ExtUtils::Installed, which is part of the standard Perl distro these days.
At work I spend alot of time working on one of our Solaris dev servers via xterm. Via many xterms simultaneously, most of the time. Since I run a local X client on my PC under cygwin, I have a shell script that I run locally that connects to the dev box and launches three xterms in pre-determined screen locations, setting DISPLAY along the way.
Over the course of a busy morning, this number can grow. Since I’m still on a Windows PC, however, I do tend to use my task bar to find windows. Having six or more taskbar buttons that all say “xterm” isn’t very helpful. For a while I tried setting my titles to reflect what I’m doing in each xterm, but this futile. Partially because I often create, destroy, or repurpose xterms on a whim; but largely because I’m lazy.
A while ago, I updated my launch script to label my initial three windows Alpha, Beta, and Gamma. While the names aren’t very descriptive, it does differentiate the windows, and I can usually remember what each window is being used for. When I start launching additional xterms, things can get confusing; I try to remember to add a -title and pick a Greek letter not in use, but I did mention I’m lazy, right? So today, I decided to do something about it.
The result is addterm, one of the more senseless perl scripts I’ve ever bothered with. When run, it creates a new xterm with the title set to the name of the first greek letter not currently in use. If all 24 greek letters are in use, and error message is printed and no xterm is launched. This is a feature, not a bug. Close some windows! The version below is my OS X port. :
#!/usr/bin/perl -w
my $user = `whoami`;
my @ps = split("\n", `ps -o command -U $user`);
my @alpha = qw/Alpha Beta Gamma Delta Epsilon
Zeta Eta Theta Iota Kappa Lambda
Mu Nu Xi Omicron Pi Rho Sigma
Tau Upsilon Phi Chi Psi Omega/;
my $k=0;
my %greek = map {$_=>$k++} @alpha;
for(@ps) {
my ($title) = /^xterm\s+-title\s+([^\s]+)/ or next;
$alpha[$greek{$title}]=) {
$next = $_;
last;
}
}
if (defined $next) {
open STDERR, ‘>/dev/null’; #discard xterm’s whining
system(”xterm -title $next & “);
} else {
print STDERR “ERROR: No greek letters free!\n”;
}
This required a port from the original Solaris version because the script uses ps to look for running xterms. The Solaris version uses ps -o args -u $user. The command should list (only) the full command + args for every process for the username $user. If you want to use this on another *nix, just test your ps command first and adjust accordingly. You could also change the Greek letters to another finite set, just remember to update the error message.
Of dubious interest is that fact that I used an array to keep the letters in order and a hash to allow quick indexing into the array. I dislike having to store the letters twice, but this seemed the best solution. I have a vague sense that some kind of tied vars may do this more elegantly, but my perl-fu isn’t quite that strong without cracking the Camel; did I mention I’m lazy? Perhaps tommorow. Improvements welcomed.
I recently needed to convert some RTF stored in a database to html (xhtml)… or least into xhtml fragments that could be wrapped inside a
tag. I only needed to support bold, italic, underline, and paragraphs; fonts, page layout, etc. could just get chucked. The result is
rtf2html.pl. Be sure to read the disclaimer at the top.
It’s a quick-and-dirty hack. It’s probably too verbose, and misses common Perl idioms. On the plus side, it works (always a plus). If you’re an experience perl guru and see anywhere I should have used a standard perl idiom, please drop a comment. I’m not looking for obfuscation-contest entries, just things I’m doing the hard (or verbose) way.
I’ve been working on an idea for a new plugin for Blosxom. Along the way I’ve learned a few things about prior art and code re-use. The idea for the plugin is simple. Next to the date banner above each day’s set of posts (generated by the ‘date’ flavour component), I’d like to add text denoting if the day is a holiday, observance, etc.
Of course, following that age-old Programmer’s virtue of Lazyness, I don’t want to have to maintain the list of dates and observances if I don’t want to. This just screams of the need for prior art… I need to find an existing format for calendar-type data, preferably a format with lots of existing data already, well, formatted and ready for consumption. I’m ashamed to say it took a bit of digging around the web before I came upon the perfect thing… which was sitting in my Mac’s dock the whole time.
Apple’s iCal, a free download for OS X 10.2+ users, features the ability to subscribe to calendars other people publish. There are calendars of movie releases, professional sports schedules, holidays and religious observances from around the globe; you name it. iCalShare has hundreds of calendars freely available. Stands to reason the spec is open, right?
Is it ever. Much to my glee, I find that iCal uses iCalendar, also known as RFC2442, the Internet Calendaring and Scheduling Core Object Specification (catchy, n’est pas?) Not only that, but other clients exist, such as Mozilla Calendar. Life is good.
However, things are about to veer off course a bit. In looking for a Perl module to read the iCal format (mime type text/calendar, or *.ics), I found a few options. Date::iCal seemed perfect at first, until I found that it only handles iCal’s date/time format (e.g. 20030921T235900) and duration format (e.g. P2D1H30M). It doesn’t actually parse the files, extract events, etc. Net::iCal, and a host of related files, seem to fit that purpose. However, there are some issues here as well:
- Version is 0.15
- Listed as ‘PRE-ALPHA’
- No activity in about 2 years.
- More prerequisites modules than I can count
Brief side note: I tried to make my life easy by using the CPAN module to grab the modules I wanted. It’s supposed to make life easy by handling build process, prerequisites, etc. However, every time I tried to configure and run it, it would beg for files. “Please install Net::FTP quickly!” it would shout. I tried to give it what it wanted, but every time I’d start installing a module, the prerequisite processing would end up trying to build and install perl5.8. Say What?? I’m running 5.6; that’s the latest for OS X from Apple, and Fink doesn’t offer 5.8 either. I’m happy with 5.6 for now. And yet, no matter what I tried, CPAN kept trying to build perl5.8. I followed the prompts for a bit before aborting, it was really going to build it from scratch.
The moral of the side note? I ended up installing each module I needed manually, i.e. perl Makefile.pl, (go get a bunch of prerequisites and install ‘em) make, make test, make install. The whole prerequisite experience made for a very recursive exercise. I eventually came to the conclusion that even if the code worked perfectly, the raft of prerequisite modules made it inappropriate for use in a Blosxom plugin.
The Net::iCal family of modules was the product of a project called Reefknot. The list archive was dead for about 6 months, but I took a shot and mailed the dev list, looking for some info on the project’s status and future. I did get a reply, pointing me to datetime.perl.org for current work on Date/Time handling in perl (including iCal formats), and to Net::vFile and related modules for handling ‘vFile’, the meta-format of iCalendar, vCard, etc.
I spent some time playing with Net::vFile. I didn’t do too poorly; admitedly my week OO perl skills slowed me down. I eventually got some simple test code almost-working; it appears that iCalendar uses nesting within the vFile format in a way which is not yet fully implemented by vFile.
At this point, I decided to roll my own simple ics file parser. My needs are simple, I just want the start and end dates for ‘events’ as they are called in iCalendar, and the summary (description). Other calendar objects, like todo’s, I can ignore; other event properties, like UID and DATESTAMP I can likewise ignore. It didn’t take to long to come up with some code to extract a list of holidays from a US Holiday file published by Apple. Well, it did take a while, but only because I’m a numbskull, see the prior post for details. I even tossed back in use of Date::ICal, to parse the date format for me.
Once I could extract the events I wanted from an ics file, I ran into (yet) another snag: RRULEs. An RRULE is a Recurrence Rule. Most of the Holidays in my file were listed with 2002 dates, and with RRULEs describing how to calculate dates in successive years. Date::iCal doesn’t do RRULEs. A couple sample RRULES (paired with the SUMMARY of the event):
SUMMARY:Daylight Saving Time Ends
RRULE:FREQ=YEARLY;INTERVAL=1;BYDAY=-1SU;BYMONTH=10
SUMMARY:Halloween
RRULE:FREQ=YEARLY;INTERVAL=1;BYMONTH=10
These things aren’t rocket science, but there’s enough variation that I’d prefer to use a library (read: code I don’t have to write). I poked around datetime.perl.org, and found that DateTime::iCal will not only read ical-formatted date strings, it will also handle RRULEs, creating a DateTime::Set. Of course, I’ll also need the original DateTime module. Each of these has a few sub-modules. I’m nervous now; again, this is for a Blosxom plugin and so should have minimal dependancies. Throwing caution to the wind, I grab all three downloads and begin to install. My first try, DateTime, stops me with no less than 4 dependancies that I don’t have. Two of these are part of the DateTime family, but there is no bundle available yet.
So now I’m back on familiar turf. There are modules to do what I need, but the amount of prerequisite modules is becoming prohibitive. But what are prerequisite modules? The use of prior art.
I’m all for code reuse. Really I am. But it seems that in Perl these days, deciding to use just one module that doesn’t ship with Perl can mean a landslide of prerequisites. For use in a hosted environment such as many of us use for Blosxom, this can be a real problem.
For now, I’m undecided. I hate to keep reinventing wheels; but on the other hand I hate to keep installing high-performance racing wheels with custom rims on a Yugo. The DateTime family of modules looks very well thought out, and very well implemented, especially given its youth. Hopefully, an install bundle will come along soon. But, just as with the Net::iCal family of modules (and all of it’s prerequisites), there’s a whole lot of functionality I just don’t need. I feel like I’ve gone through alot of options just to implement a small bit of functionality. Maybe this is the curse of prior art… specs tend to be big, even when what you need is only a small part of one.
I’ve categorized this as Programming::Perl but I could just as easily have put it under Mistakes::Beginner::Really Stupid.
I was experimenting with a bit of Perl code to process a text file. The whole thing was only maybe 25 lines or so. Every time I’d run the code, the only output I’d get was a single line:
Abort.
I scratched my head and perused my code for quite a while. I didn’t see any place I was explicitly causing this, and it seemed like the most cryptic error I’d seen. A cursory search of perldoc.com and Perl Monks failed to shed any light. I assumed I must be abusing my I/O in some fashion (I am still a rookie in Perldom).
My perl debugging knowledge is even more limited than my Perl knowledge. At work I’ve used a TK-based debugger (something like ptkdbg, can’t remember), which is GUI-fied and fairly straightforward, but which I don’t seem to have on my Powerbook (note to self: find out what that is, and find a copy for home). Falling back on that most time-honored of debugging methods, the manual trace (read: liberal (ab)use of print statements), I found that my I/O was fine. The abort occured at the very end, when I tried to output (via Data::Dumper) the data structure I’d built from the file. Those of you with more perl-fu than I will see the problem immediately:
dump(\%ev);
I didn’t see it, so I searched perldoc.com for dump(). And then I learned. What I should have done was this:
print Dumper(\%ev);
This would have pretty-printed my hash full of hashes in all its nested glory. The call to dump(), on the other hand, asks Perl to immediately dump core and abort. Perl obliged.
At least now I know what “Abort.” in my perl output means.