Wisdom of the Documentation

Today, I spent some time staring at an old piece of code that I had written at least a year ago. It’s been in testing several times, but never put into production (the project it is tied to has been bumped on several occaisions). Today, it was back in testing.

The code is a failry simple web service, written in Perl. I like Perl. I have no illusions that I’m a fantastic Perl hacker, but I know the language well, though both experience and reading. I’ve read most of the O’Reilly Perl titles, including Programming Perl (“The Camel”), which I’ve read cover to cover at least three times. I still find myself looking things up, usually to refresh my memory about something I can remember reading, or some syntax detail I can’t get right (one of the perils of working in multiple languages). At least I generally know where to look.

So this web service has been tested before. It works in a browser, and it works when called by my test client. It’s been tested with a third-party bit of code. Today, it was tested by Dave, using some custom client code he had written in C#. And it worked… if he told his http library to ignore HTTP protocol errors. If he didn’t, his library complained.

And so I stared at the code for a while. By coincidence, I’d been reading the HTTP Spec over the weekend (yes, I’m a geek), and was pretty sure my response was good- a bare-minumum response, along the lines of:

HTTP/1.1 200 OK
Content-Type: text/plain; charset=utf-8

Single Line Response

I double checked the spec anyway, and kept staring at the code. I was about to start grasping at straws and adding additional entity headers to the response (such as Content-Length), when I finally stared at the code long enough. I saw something like this:

print "HTTP/1.1 $status\n"
print "Content-Type: text/plain; charset=utf-8\n"

Then it hit me- "\n" in perl is a “magic newline”- it conforms to the newline convention on the system in question. HTTP, on the other hand, requires ASCII CR+LF (Cariage Return + Line Feed, or "\r\n" in C) as a line terminator. Apparently all of the code thrown at the service before today was a bit forgiving. I changed the strings to send CRLF using octal escape sequences ("\015\012"), and everything was fine. I was a bit ticked about the mistake… I new both the HTTP requirement for CRLF and the Perl treatment of "\n" when I originally wrote the code; it was a dumb mistake. Also aggravating that it took so long to spot.

And there my tale should end. But this evening, I started wondering if the octal sequence was the most Perlish way to send a CRLF. I knew that "\r" is fine for the CR, but you can’t use "\n" for the LF – it’s magic in perl, and behaves differently on different platforms. I began to wonder if Perl has a backslash-escape for LF that is always LF. Eventually, I had to check for myself, so I referred to the Quote and Quote-like Operators section of the perlop man page. (Sadly, I knew right where to look, right down to the name of the section. Geek, remember?)

Turns out the manpage specifically recommends the octal form for networking applications (at least I got that right), but then it twists the knife:

If you get in the habit of using "\n" for networking, you may be burned some day.

D’oh.

Both comments and pings are currently closed.

6 Responses to “Wisdom of the Documentation”

  1. Mark Says:

    Familiar story – “cat -v” is your friend<br/>

    I’ve run into this problem once or twice before too. I now have a fairly standard check that I do if I suspect the line ending problem which is run the output through cat -v on a *nix-y system (Cygwin, OS X, Linux, etc). If your script is called foo you can do this with foo | cat -v. If you see ^M at the end of each line you know that your script is putting in \rs.

  2. Jason Says:

    Thanks Mark, that’s a great tip… and that’s exactly why I love *nix. So much useful goodness already baked right in. I’ll be using that one again for sure.

  3. Yeager Says:

    I’ll be using that tip in the future too. This is just as much my fault though, because Chris and I ran into that problem recently with some of the interfaces we’ve been doing… silly of me not to think of it then.

  4. chornbe Says:

    Yeah, Yeager and I were tinkering with, I believe, some text-translation stuff going across platforms. Working in C# on Windows, you would think that the well-recognized “\n” character would do its magic, then when the data file was FTP’d in ASCII mode, the lines would be translated. As it turns out, for whatever reason, depending on exactly where and how you use “\n”, it may or may not properly replace it with CR or CRLF. Maybe.

    We had to drop in a small text fixer-upper that pre-processed the text to replace the literal “\n” in the text with the value of System.Environment.NewLine. Then, everything worked as expected.

    Note: this was only a problem writing to a TextWriter object threaded onto a FileStream object using System.Text.Encoding.Default. Explicitly setting the encoding type is a sure-fire way to get incompatible files – go figure.

    As much as I absolutely LOVE C# and .NET, sometimes I wonder… I just wonder…

  5. Kelly Cline Says:

    I have a C# program that sends an string to a WebService. The string contains \r\n in a number of places. When in debug, I can see that the \r\n are there, but a breakpoint at the very entrance to the WebService method shows its incoming string to have \n (followed by a space). If the method returns a string with \r\n, I get the same behavior back in the Client (no \r). Is there a WebService setting to control that behavior?

  6. chornbe Says:

    @Kelly Cline

    Kelly, that’s something that’s platform-dependent. Typically… and I stress “typically”… the client-server handle string conversion (to/from Unicode) and line ending settings based on whatever encoding is set, or when not explicitly set, based on the default for the platform, OS, ETC.

    If you’re specifically coding up line-endings and returns in your strings, you might consider replacing them with System.Environment.NewLine and letting the system deal with the specifics. If you’re receiving various types of line-endings and carriage return combinations, you could specifically search them out, replace them and insert your own System.Environment.NewLine values where applicable.

    If I misunderstood your question, feel free to shoot me and email telling me so and get some specific data examples to me; I’d be happy to try to help.

    chris at chornbe dot com

    Hope this helped.