Simple versioned TimeMachine-like backup using rsync


rsync logoOver many years, I have dealt with scripts that do backup versioning, i.e., maintain multiple backups. Due to their flexibility, they have been complex to understand and configure. Here is a simple rsync-based tool with a different focus: The experienced systems administrator who wants to keep his system’s complexity down.

Backup in action

It consists of a simple script, which you can call rsync-backup.sh and store wherever you like, e.g., in /usr/local/sbin. I will use these names and paths in the examples.

#!/bin/sh
# Usage: rsync-backup.sh <src> <dst> <label>
if [ "$#" -ne 3 ]; then
    echo "$0: Expected 3 arguments, received $#: $@" >&2
    exit 1
fi
if [ -d "$2/__prev/" ]; then
    rsync -a --delete --link-dest="$2/__prev/" "$1" "$2/$3"
else
    rsync -a                                   "$1" "$2/$3"
fi
rm -f "$2/__prev"
ln -s "$3" "$2/__prev"

During normal operation, it boils down to three simple statements:

  1. rsync with --link-dest: Copying the contents of <src> to <dst>/<label>, reusing the files from the previous backup with hard links [1]The non---link-dest rsync does not use --delete to reduce the risk of accidentally deleting files when called with wrong parameters
  2. rm and ln: Remember this backup location for the next incremental backup.

Voilà – it doesn’t get much easier than that!

Of course, there is something missing: The actual backup policy. It is separated into cron, which I consider an advantage. Using this separation of duties, many policies can be implemented very easily and composed in a modular way:

Create daily backups  for every weekday

You might know this from automysqlbackup or autopostgresqlbackup: A backup is created every day and overwritten after 7 days. This is achieved by adding the following file to /etc/cron.daily/:

#!/bin/sh
/usr/local/bin/rsync-backup.sh /home /data/backup `date +%A`

All your user’s files are copied daily to /data/backup, named after the current day, overwritten weekly.

Daily backups for a month

Sure, this is easy as well, by putting this with a descriptive name into /etc/cron.daily/:

#!/bin/sh
/usr/local/bin/rsync-backup.sh /home /data/backup `date +Day-%d`

 Hourly backups for the current day

Here, I follow a slightly different approach. To remove clutter, I put all files in a directory today/ (which you have to create beforehand). Of course, a similar approach can also be followed for the daily backups above by changing the date format to +thismonth/%d. Of course, this goes to /etc/cron.hourly/

#!/bin/sh
/usr/local/bin/rsync-backup.sh /home /data/backup `date +today/%H`

Never-overwritten monthly backups

If you want to keep an archive of monthly backups forever (i.e., as long as disk space lasts), this can be put into /etc/cron.monthly/:

#!/bin/sh
/usr/local/bin/rsync-backup.sh /home /data/backup `date +%Y-%m-%d`

Of course, you will want to make sure that you keep an eye on disk space usage and (if necessary) make a decision to trim the backups, change your backup configuration or purchase additional disk space. This should always be an administrator decision. Just letting an automated process prune whatever it considers “old” is not an option, IMHO.

Tuning a little more

If you combine multiple of these, there will be multiple backups occurring at a single moment. E.g., in the night of the first day of the month, you will have a monthly, one or two daily, and an hourly backup possibly run in the same hour.

This might seem extremely wasteful at first, but as the system employs hard links only, not a single file is actually copied (unless some files actually changed in the meantime). Even though it might not be extremely wasteful, it still remains wasteful, because the file tree has to be walked and directories as well as hard links created.

To reduce the number of immediately adjacent backups with different lifetimes, it might be good enough to create only the backup with the longest lifetime. In a hourly–daily–monthly scheme, this might go into /etc/cron.d/ [2]Using the full path to rsync-backup.sh if it is not in your $PATH.

# First day of month -> persistent
8 23   1    * * rsync-backup.sh /home /backup `date +\%Y-\%m-\%d`
# Other days of month -> recycled next month
8 23   2-31 * * rsync-backup.sh /home /backup `date +thismonth/\%d`
# Other hours of day -> recycled next day
8 0-22 *    * * rsync-backup.sh /home /backup `date +today/\%H`

Please not the extra backslashes before the percent signs, as cron will change unescaped percent signs to newlines.

All operations will start 8 minutes past the hour (first field), feel free to place this at a time where your system is not loaded.

Every hour from 00:08 to 22:08, the hourly backup is run.

At 23:08 on the first day of the month, the persistent backup is run, on other days, the one which will be recycled. A common default setup on Linux systems is to have the daily and monthly cron jobs run in the early morning. I prefer running backups shortly before midnight, as the daily backup will be named after the day whose modified files it contains.

,

Let’s stay in touch!

Receive a mail whenever I publish a new post.

About 1-2 Mails per month, no Spam.

Follow me on the Fediverse

Web apps


13 responses to “Simple versioned TimeMachine-like backup using rsync”

  1. Hi, thanks for your post !

    You have a possible race condition at the end of your script (which would create a __prev link in your backups), with the “rm” and the “ln“… You should think about using “ln” and “mv” (because “ln” is not atomic):

    ln -sfd "$3" "$2/__prev.new" && mv "$2/__prev.new" "$2/__prev"

    (The “ln” and the “mv

  2. @vaab: You are right, this could be made cleaner. But then probably the “.new” would need to be replaced by “.$$” to make the intermediate link unique between two instances of the backup script terminating at the same time.

    I do not consider the race condition critical, as two backups should not be running so close, that they will terminate at the same time (you probably have other problems with your machine, then, including performance).

    Even if two concurrent backups terminate at the same time, it seems that the worst to happen is that the link points to the one which terminated just slightly earlier. This might lose some file space due to unnecessary copies the next time, but should not impact the correctness, does it?

    BTW: Do you know what happens with rsync if the --link-dest directory is replaced while the backup is in process? This might have more impact on concurrent backups…

  3. Hi Marcel,

    Your script works great… However, I am trying to pull the backed up data from a remote source on my network using ssh in the Cron command:

    8 23 1 * * rsync-backup.sh root@192.168.1.100:/home /backup `date +\%Y-\%m-\%d`

    and I get the following error:

    Permission denied, please try again.
    Permission denied, please try again.
    Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
    rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
    rsync error: unexplained error (code 255) at io.c(600) [receiver=3.0.6]

    My Key setup works fine, because I can otherwise use Rsync without any problem via the command line.

    Do you have any idea how I could solve this issue? How could I ssh into the remote source without adding any argument that would conflict with rsync-backup.sh?

    Should I add something to rsync-backup.sh?

    Thanks

    • John,

      I like your idea to use a remote source; very elegant!

      From the error message, it looks like the ssh invoked by rsync is requesting a password from standard input. Therefore, I guess it is not using your ssh key.

      My guess it is that the ssh key is only in ~john/.ssh, not in ~root/.ssh. After a sudo -s, HOME is typically still set to ~john.

      Does the key-based ssh login still work when run from sudo -sH?

      Besides copying your keys with e.g. cp ~john/.ssh/id_* ~root/.ssh/, you could also include a line HOME=/home/john in your crontab (tilde expansion does not work here).

      Good luck!
      -Marcel

      • Your guess was correct. I was using ssh but my key was in /.ssh so it could not be accessed by the cron job. I painstakingly set up host-based authentication so has to allow passphraseless SSH connections.

        Now it works. Thanks again for all your sharing.

  4. Hi Marcel,

    Is there a simple way to hide the _prev link.

    My network (Samba) users see it in their windows machine and this is a bit confusing.

    I cannot figure out how to do this and whether what I am trying to do may mess up your system.

    Thanks

    John

  5. Actually, I have a much bigger problem than the above issue.

    Instead of creating links in the multiple generations, the system does full copies and my 3 TB disk is almost full.

    I inserted the following in /var/spool/cron/root

    10 23 1 * * /usr/local/bin/rsync-backup.sh /media/HHH /media/Time_Machine `date +\%Y-\%m-\%d` && chmod -R 1755 /media/Time_Machine #First day of month -> persistent / HHH
    10 23 2-31 * * /usr/local/bin/rsync-backup.sh /media/HHH /media/Time_Machine `date +This_Month/\%d` && chmod -R 1755 /media/Time_Machine #Other days of month -> recycled next month / HHH
    10 8-19 * * 1-5 /usr/local/bin/rsync-backup.sh /media/HHH /media/Time_Machine `date +Today/\%H` && chmod -R 1755 /media/Time_Machine #Other hours of day -> recycled next day / HHH

    The script is exactly as entered in your post. Would you by any chance have an idea of what could be going on?

    Thanks

  6. Hi,
    Thankyou for the script.
    In testing I see this error
    “–link-dest arg does not exist: backups//__prev”

    when running “./rsync.sh data/ backups/ `date +%T`”

    However if I check the link (ls -la) on the __prev file it is correct.

    Any ides? Is this something I should be worried about?
    Thankyou

    • Barbz,

      could you please try the following? (I will try to reproduce it in the next few days)

      1. Run the command without the slashes, i.e. ./rsync.sh data backups `date +%T`
      2. Use a date parameter which does not contain colons (e.g. date +%H instead of date +%T)
      3. If neither works, run the script in verbose mode: sh -vx ./rsync.sh data backups `date +%T` and send me the output

      BTW: You’re not running this on Windows (because of the colons)?

      -Marcel

      • Hi Marcel,

        Not on windows, was an issue with not using the full path.

        Would you know how this could be adapted for pushing to a remote ssh server?

        Pushing the files isnt an issue however Im struggling to get the symbolic link component to work.

        That said Ill have a go with pulling them instead however I would prefer to push due to port forwarding requirements.

        Thankyou

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.