rsync is some powerful voodoo. It stands for R emote Sync hronization. The idea is that if you have some files in one place and you want that to be replicated in another place, rsync will do this in the most efficient way possible. I usually use rsync over an SSH connection (handled automatically by rsync) but there are other ways too. The SSH connection ensures security during transport using known and trusted technology.
rsync isn’t sufficient for your needs,
this catalog of data
moving technology is very nice. (Note the asciidoc presentation,
obviously a pro!)
rsync -P -v -a -e ssh /path/dirtocopy [/path/moretocopy] xed.ucsd.edu:~/dest
Note that /path/dirtocopy creates a dirtocopy on the remote system while /path/dirtocopy/ just copies the contents.
Sometimes you don’t want to saturate your network. To limit transfers to 15kB/sec:
What if you want to make a back up to a machine where you have full sudo access but no direct connection as root (i.e. Ubuntu or a Mac).
sudo rsync --rsync-path="sudo rsync" -aP /home/ firstname.lastname@example.org:raven-backup/
Automatic Secure Transfers With rsync and SSH Keys
One of the nice things about rsync is that when using SSH keys you can design a setup that will very securely do unattended file transfers. This is useful for nightly backups or offloading logs or video captures.
The first step in setting this up is getting your SSH keys all sorted
out. There is some flexibility/complexity regarding the particular
strategy, but I’m going to describe a situation where a backup server
pulls data off of a main repository somewhere. There will be two
back. The idea will be to leave
except for when it’s time to get data and do as much of the set up and
back. That said,
main still has to be prepared to
receive and comply with the request that
back will be making.
Establish Key Pair
The first step is to set up an SSH key pair so that
back can log
main. This may be a security problem in general, but later once
things are working, we’ll restrict what this key pair can do so that
it will only apply to
back performing this particular backup task.
back do the following entering a name (I used
and just hitting enter at the password prompt:
[back][~/.ssh]# cd ~/.ssh [back][~/.ssh]# ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): main-puller_rsa Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in main-puller_rsa. Your public key has been saved in main-puller_rsa.pub. The key fingerprint is: 97:eb:40:97:2b:7a:a9:4d:7a:99:91:88:61:c6:b9:2f root@back The key's randomart image is: +--[ RSA 2048]----+ | .++.o. | | oo.o. | | . + .. | | . o o.. | | + o.E . | | o =. . | | +.. | | *. | | o.. | +-----------------+
|Now you have an unencrypted key pair on back. Don’t let that private key wander off into an insecure situation!|
Installing Public Key
Now you have to put the public key on the target machine. There are ways to do this using cut and paste, but this should be unambiguous:
[back][~]# cat ~/.ssh/fs-puller_rsa.pub | ssh main "cat >> ~/.ssh/authorized_keys"
This appended your newly created key (the public part) to the authorized keys file.
Test to make sure that the key pair works. You need to specify the key you want to use explicitly like so:
[back][~]# ssh -i ~/.ssh/fs-puller_rsa main Last login: Mon Oct 10 11:23:36 PDT 2011 from back on ssh [main][~]#
Cool. Now we have a way to make easy SSH connections. Next is making that connection a lot less easy for everything but the back up mission.
Restricting SSH Key To Limited Functionality
The hardest thing about this process is knowing exactly what command
you want the
main host to run. In this example, it’s some kind of
rsync but that’s not good enough. We need to know exactly what the
command is. The way to find this out is to create a small script that
simply dumps out the exact command as submitted by the SSH client
back in this case). The way we intercept our client’s command is by
command= key directive. By putting this phrase followed
by the command you want to run when that key makes a connection, you
can control what the key pair does. It turns out that it doesn’t
matter what the client asked for. If a client makes a connection using
a key pair with a
command= directive, that command will be run. To
answer our question, we create this temporary script and make it the
command that is run:
#!/bin/sh # Put this in front of the key of interest in .ssh/authorized_keys: # command="/path/test_ssh_cmd" ssh-rsa AAAAB3.....x4sBbn62w6sISw== xed@xedshost echo "$SSH_ORIGINAL_COMMAND" >> sshcmdlog exec $SSH_ORIGINAL_COMMAND
Notice that all this script does is take the variable
$SSH_ORIGINAL_COMMAND and append it to a file called
Then runs the submitted command.
Here’s a test command:
:-> [back][~/.ssh]# ssh -i ~/.ssh/fs-puller_rsa main cal 1 2012 January 2012 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Seems that it ran that normally on
back. But over on
main here is the setup and the result:
[main][~]# cat ~/test_ssh_cmd echo "$SSH_ORIGINAL_COMMAND" >> ~/sshcmdlog exec $SSH_ORIGINAL_COMMAND [main][~]# tail -n1 ~/.ssh/authorized_keys command="~/test_ssh_cmd" ssh-rsa AAAA....etc....AhySEWf9 root@back [main][~]# cat ~/sshcmdlog cal 1 2012
Notice that we were able to capture the exact command sent by the client as the server saw it. In this case, the command was simple and I could have just guessed that directly, but you’ll see that with rsync commands it can get tricky and it’s best not to spend all day guessing what it’s receiving.
Now I run a test of the real thing on
[back][~]# rsync --rsh='ssh -i /root/.ssh/fs-puller_rsa' -aP --del main:/files/users/xed /raid/users/
Check what turned up in
[main][~]# cat sshcmdlog cal 1 2012 rsync --server --sender -vlogDtpre.iLsf . /files/users/xed
Now you can see why this is so hard to get right by guessing. I didn’t submit this command in this form, but the server interpreted as such. This is how you have to set the key on the SSH server machine so it will only honor these jobs.
This is the final form of the command directive in the SSH key in the
authorized_keys file on
:-> [main][~]# tail -n1 ~/.ssh/authorized_keys command="rsync --server --sender -vlogDtpre.iLsf . /files/users/xed" ssh-rsa AAAAB3W...ETC...SEWf9 root@back
|In the example, I used the rsync flag
Restricting SSH Key To Limited Hosts
Now this unencrypted key pair really can’t do anything but make a backup. But just to limit mischief, we can further restrict the operation of that key to specific hosts. This means that if somehow credentials are stolen, a backup won’t be made to some unknown IP number. Here’s how to implement that:
:-> [main][~]# tail -n1 ~/.ssh/authorized_keys from="back",command="rsync --server --sender -vlogDtpre.iLsf . /files/users/xed" ssh-rsa AAAAB...ETC...hySEWf9 root@back
Note that our host here is
back but it can be an IP number or even
wild card values (so I’ve heard).
Now that you have set things up so your back up server can
directly make a legitimate rsync backup job using SSH, you can now
automate that in a cron job. Just run
crontab -e and put the rsync
command in a cron job (in my example situation, this is done on
back, the requesting client). This will do my example every night at
one in the morning:
0 1 * * * rsync --rsh='ssh -i /root/.ssh/fs-puller_rsa' -aP --del main:/files/users/xed /raid/users/
(see cron help)
Wait, Actually I Can’t Use Keys
I’ve had the very odd situation where there is a key installed and you don’t want to use it but it wants to insist. Here’s the answer.
sudo rsync \ --rsh='ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no'\ -rP email@example.com:/local/xed/data/dw .
Alternatives to rsync
The SSH method:
ssh to transfer a filesystem:
tar -cjf - mp3/ | ssh -C -o "CompressionLevel=9" xed.ucsd.edu tar -C /target-xjf -
(see tar help)
Here’s a way to use
lftp to get something from
lftp with it prompting
the user for a password.
$ lftp -e "get public_html/x.ico;exit" firstname.lastname@example.org
lftp -e "get location/index.html;exit" -u xed http://example.xed.ch
lftp can also do a mirror.
Want to have a topology nightmare?
Unison might be just
what you need. Unison allows changes on either of two file systems
which can then be reconciled. Imagine that you have a laptop and a
D respectively. You synchronize your file systems
somehow so they are the same. Then you delete
L:temp and add
L:new. Over on
D you create
D:different_new and delete
When you run Unison, in theory, it will delete
L:different_new. That’s already a wee bit
confusing for me, but now imagine that you get a second laptop,
and you sync to that from
L. Ok. Now what if you try syncing
D? You’ve got a graph loop. You need to keep a star topology. Fun!
You want snapshots? You’re jealous of those fruity computer people
going on about "Time Machine"? Well that functionality has been around
for a long time in the free software world. Check out
rdiff-backup which can take
bandwidth efficient snapshots of the differences in a file system
allowing you to reconstruct it’s state at any other point a snapshot
was made. If you run this in a cron job every night you will be able
to go back to a file system state from any day. This is very handy for
certain kinds of backups. The downside is that if you deal with large
temporary files (video editing, let’s say),
really delete those but just make a note of the fact that they were
deleted. This can be circumvented a bit, but generally rdiff-backup
takes up more room than a straight mirror.