Magic with wget

As mentioned previously, I’m in the process of switching hosts. Last night I uploaded my entire site at the new host, and set everything up. Along the way, I decided to refine my internal directory structures a bit. Everything is working fine on the new host, and once I get the e-mail accounts squared away I’ll be ready to make the DNS switch.

In the meantime, I have a synchronization problem. When I uploaded my site to the new host, I used a site backup (tarball) from the old host. This made it easy to preserve directory structures as well as timestamps. Because of the way Blosxom works, the file datestamps on my posts are very important… Blosxom uses them for the post date/time. After the upload (and after I moved a few things around), both old an new sites were in sync, with the same posts in the same categories, and having the same timestamps. From that point on, new posts to the old site are out of sync with the new site. Posting to both locations is no good; the timestamps will be off and it breaks the programmer’s first virtue (laziness). I can’t use scp because my old host doesn’t offer shell access.

I went looking for a way to transfer a file from the old host to the new host, preserving the timestamp, and preferably making it easy to keep things in the right directory. I’ll have to run this after each post until the old site goes away. I looked at cURL first, but it didn’t quite do all I needed, so I turned to wget. Magic ensued.

The setup: from the shell account on my new host, grab the file from my old host. The file can only be retrieved from the old host via FTP. As an example, I wanted to sync my prior post on the new iPods. It’s in the category Apple and the “stub” title is newpods. On the old host, the file is in /blosxom/Apple/newpods.txt. On the new host, it needs to go into ~/jclark.org/blosxom/content/Apple/newpods.txt. I didn’t want to specify the directory in both the source and destination. The solution:

cd ~/jclark.org/blosxom/content
wget -N -x -nH --cut-dirs=1 ftp://jclark.org/blosxom/Apple/newpods.txt 

The first command just puts me in the base directory for my posts on the new server. The magic is in the wget command. -N turns on timestamping, preserving my timestamps. -x forces wget to create directories locally to match the remote (this is the default for recursive fetches). Normally, the dirs created would start with the host name (e.g., jclark.org), but -nH removes the host name. Finally, –cut-dirs removes directories from the front of the path, so the file blosxom/Apple/newpods.txt on the remote end becomes Apple/newpods.txt locally. This combined with the initial cd lets me handle the changes I made to my directory structure. After I publish this post (on the old host), I’ll run the same command from the new host, plugging in the new file/path.

One detail of note: the above wget command will try to login anonymously, and give up if it fails. You can specify user and password on the command line, but bad idea on a shared host (think ps -aux, although my host protects against this). If you specify the user without password, you don’t get prompted for a password. The way around this is an old UNIX standby, the .netrc file.

Both comments and pings are currently closed.

One Response to “Magic with wget”

  1. polis Says:

    Thanks, it helped me lot with “-nH” issue.