Rsync: How it works and how to use it

By | March 10, 2016

command line rsync
Rsync is a command line utility that is used to transfer multiple files, or entire directories, from one computer to another. It is a tool that comes with most Linux and Unix-like operating systems and it has great flexibility. You can update files and/or folders (recursively) on your machine, or the machine you are connecting to. Making use of some clever algorithms and design choices, directories can be synchronized quickly by only transferring the files specified (new or old, larger or smaller, etc), or even just the “parts” or sections of the file that are different. From the Wikipedia:

The recipient splits its copy of the file into chunks and computes two checksums for each chunk: the MD5 hash, and a weaker but easier to compute ‘rolling checksum’.[18] It sends these checksums to the sender.

The sender quickly computes the rolling checksum for each chunk in its version of the file; if they differ, it must be sent. If they’re the same, the sender uses the more computationally expensive MD5 hash to verify the chunks are the same.

The sender then sends the recipient those parts of its file that did not match, along with information on where to merge these blocks into the recipient’s version. This makes the copies identical. There is an unlikely probability that differences between chunks in the sender and recipient are not detected, and thus remain uncorrected. With 128 bits from MD5 plus 32 bits from the rolling checksum, the probability is on the order of 2−(128+32) = 2−160.

Essentially, if you add a few rows to a large data file and wanted to sync the updated file, you would only need to transfer a fraction of the files total size. And if the connection is interrupted for whatever reason, it is able to pick up where it left off. It’s a shame there isn’t a Windows version of Rsync that lives up to the performance standard of this highly versatile utility. Here’s an example command of the most basic usage:

  • rsync /path/to/folder user@

That will take all the files in “folder” and sync them to the “path” directory of the target machine. The “user” on the target machine must have write privileges, and it will prompt you for the user’s password. Now, you can take it a step further and make it transfer all sub folders and files (-r), skip files that already exist on the target machine(–ignore-existing), display transfer progress and transfer rate (-P), and do a “dry run” first by displaying all the files before initiating the transfer (-n) by typing:

  • rsync -rP –ignore-existing -n /path/to/folder user@

To go through with the transfer, issue the same command again, except without the “-n” flag. This usage is perfect if you have folders within folders that you would like to transfer as well. It is also useful if you want to keep the transfer as short and sweet as possible because it skips files that the remote location already has. There are many other ways to specify which files to transfer, and which ones to skip:

  • rsync –include=’*.csv’ –exclude=’*’ /path/to/folder user@

This will transfer all files that end with “.csv” and ignore everything else. Flags can be combined like this to suit your specific needs. Note that if you want to “–include” subdirectories in a recursive case, you’ll need to add a “–include=’/relative/path/to/folder'” flag for each location. You can also delete the file after you are done transferring it. Good if you don’t want to hang on to bulky files after you already backed them up. Do it by adding “–delete”. It would be prudent to do a dry run first before you make any irreversible changes, e.g., “–dry-run”. Is the data sensitive? No problem, run the sync through an SSH tunnel just by adding those three letters:

  • rsync ssh /path/to/folder user@

To top it off, you can have it spit out a log file by adding –log-file=’log.txt’ and it will document what has been done, and when it was done. The best part of being able to do something like this with Rsync is the fact that it can be easily automated with cron. Basically, you can create a .sh or “shell” file, make it executable, and add it to cron’s daemon so it runs at regular intervals. I covered the basic process in this post. It is simple and robust. Reliable and easy to learn. Free and open source.

*Please forgive the two line appearance of example commands, which are much easier to read as one line. This is a common thing due to space limitations on web pages and magazines. If this is confusing to you, just remember that each “element” in a command is separated by one space: [rsync] [options] [source] [target]

Leave a Reply

Your email address will not be published.