Over many years, I have dealt with scripts that do backup versioning, i.e., maintain multiple backups. Due to their flexibility, they have been complex to understand and configure. Here is a simple rsync-based tool with a different focus: The experienced systems administrator who wants to keep his system’s complexity down.
Backup in action
It consists of a simple script, which you can call
rsync-backup.sh and store wherever you like, e.g., in
/usr/local/sbin. I will use these names and paths in the examples.
#!/bin/sh # Usage: rsync-backup.sh <src> <dst> <label> if [ "$#" -ne 3 ]; then echo "$0: Expected 3 arguments, received $#: $@" >&2 exit 1 fi if [ -d "$2/__prev/" ]; then rsync -a --delete --link-dest="$2/__prev/" "$1" "$2/$3" else rsync -a "$1" "$2/$3" fi rm -f "$2/__prev" ln -s "$3" "$2/__prev"
During normal operation, it boils down to three simple statements:
--link-dest: Copying the contents of <src> to <dst>/<label>, reusing the files from the previous backup with hard links The non-
rsyncdoes not use
--deleteto reduce the risk of accidentally deleting files when called with wrong parameters
ln: Remember this backup location for the next incremental backup.
Voilà – it doesn’t get much easier than that!
Of course, there is something missing: The actual backup policy. It is separated into cron, which I consider an advantage. Using this separation of duties, many policies can be implemented very easily and composed in a modular way:
Create daily backups for every weekday
#!/bin/sh /usr/local/bin/rsync-backup.sh /home /data/backup `date +%A`
All your user’s files are copied daily to /data/backup, named after the current day, overwritten weekly.
Daily backups for a month
Sure, this is easy as well, by putting this with a descriptive name into
#!/bin/sh /usr/local/bin/rsync-backup.sh /home /data/backup `date +Day-%d`
Hourly backups for the current day
Here, I follow a slightly different approach. To remove clutter, I put all files in a directory
today/ (which you have to create beforehand). Of course, a similar approach can also be followed for the daily backups above by changing the date format to
+thismonth/%d. Of course, this goes to
#!/bin/sh /usr/local/bin/rsync-backup.sh /home /data/backup `date +today/%H`
Never-overwritten monthly backups
If you want to keep an archive of monthly backups forever (i.e., as long as disk space lasts), this can be put into
#!/bin/sh /usr/local/bin/rsync-backup.sh /home /data/backup `date +%Y-%m-%d`
Of course, you will want to make sure that you keep an eye on disk space usage and (if necessary) make a decision to trim the backups, change your backup configuration or purchase additional disk space. This should always be an administrator decision. Just letting an automated process prune whatever it considers „old“ is not an option, IMHO.
Tuning a little more
If you combine multiple of these, there will be multiple backups occurring at a single moment. E.g., in the night of the first day of the month, you will have a monthly, one or two daily, and an hourly backup possibly run in the same hour.
This might seem extremely wasteful at first, but as the system employs hard links only, not a single file is actually copied (unless some files actually changed in the meantime). Even though it might not be extremely wasteful, it still remains wasteful, because the file tree has to be walked and directories as well as hard links created.
To reduce the number of immediately adjacent backups with different lifetimes, it might be good enough to create only the backup with the longest lifetime. In a hourly–daily–monthly scheme, this might go into /etc/cron.d/ Using the full path to
rsync-backup.sh if it is not in your
# First day of month -> persistent 8 23 1 * * rsync-backup.sh /home /backup `date +\%Y-\%m-\%d` # Other days of month -> recycled next month 8 23 2-31 * * rsync-backup.sh /home /backup `date +thismonth/\%d` # Other hours of day -> recycled next day 8 0-22 * * * rsync-backup.sh /home /backup `date +today/\%H`
Please not the extra backslashes before the percent signs, as cron will change unescaped percent signs to newlines.
All operations will start 8 minutes past the hour (first field), feel free to place this at a time where your system is not loaded.
Every hour from 00:08 to 22:08, the hourly backup is run.
At 23:08 on the first day of the month, the persistent backup is run, on other days, the one which will be recycled. A common default setup on Linux systems is to have the daily and monthly cron jobs run in the early morning. I prefer running backups shortly before midnight, as the daily backup will be named after the day whose modified files it contains.