Over many years, I have dealt with scripts that do backup versioning, i.e., maintain multiple backups. Due to their flexibility, they have been complex to understand and configure. Here is a simple rsync-based tool with a different focus: The experienced systems administrator who wants to keep his system’s complexity down.
Backup in action
It consists of a simple script, which you can call rsync-backup.sh
and store wherever you like, e.g., in /usr/local/sbin
. I will use these names and paths in the examples.
#!/bin/sh # Usage: rsync-backup.sh <src> <dst> <label> if [ "$#" -ne 3 ]; then echo "$0: Expected 3 arguments, received $#: $@" >&2 exit 1 fi if [ -d "$2/__prev/" ]; then rsync -a --delete --link-dest="$2/__prev/" "$1" "$2/$3" else rsync -a "$1" "$2/$3" fi rm -f "$2/__prev" ln -s "$3" "$2/__prev"
During normal operation, it boils down to three simple statements:
rsync
with--link-dest
: Copying the contents of <src> to <dst>/<label>, reusing the files from the previous backup with hard links [1]The non---link-dest
rsync
does not use--delete
to reduce the risk of accidentally deleting files when called with wrong parametersrm
andln
: Remember this backup location for the next incremental backup.
Voilà – it doesn’t get much easier than that!
Of course, there is something missing: The actual backup policy. It is separated into cron, which I consider an advantage. Using this separation of duties, many policies can be implemented very easily and composed in a modular way:
Create daily backups  for every weekday
You might know this from automysqlbackup
or autopostgresqlbackup
: A backup is created every day and overwritten after 7 days. This is achieved by adding the following file to /etc/cron.daily/
:
#!/bin/sh /usr/local/bin/rsync-backup.sh /home /data/backup `date +%A`
All your user’s files are copied daily to /data/backup, named after the current day, overwritten weekly.
Daily backups for a month
Sure, this is easy as well, by putting this with a descriptive name into /etc/cron.daily/
:
#!/bin/sh /usr/local/bin/rsync-backup.sh /home /data/backup `date +Day-%d`
 Hourly backups for the current day
Here, I follow a slightly different approach. To remove clutter, I put all files in a directory today/
(which you have to create beforehand). Of course, a similar approach can also be followed for the daily backups above by changing the date format to +thismonth/%d
. Of course, this goes to /etc/cron.hourly/
#!/bin/sh /usr/local/bin/rsync-backup.sh /home /data/backup `date +today/%H`
Never-overwritten monthly backups
If you want to keep an archive of monthly backups forever (i.e., as long as disk space lasts), this can be put into /etc/cron.monthly/
:
#!/bin/sh /usr/local/bin/rsync-backup.sh /home /data/backup `date +%Y-%m-%d`
Of course, you will want to make sure that you keep an eye on disk space usage and (if necessary) make a decision to trim the backups, change your backup configuration or purchase additional disk space. This should always be an administrator decision. Just letting an automated process prune whatever it considers “old” is not an option, IMHO.
Tuning a little more
If you combine multiple of these, there will be multiple backups occurring at a single moment. E.g., in the night of the first day of the month, you will have a monthly, one or two daily, and an hourly backup possibly run in the same hour.
This might seem extremely wasteful at first, but as the system employs hard links only, not a single file is actually copied (unless some files actually changed in the meantime). Even though it might not be extremely wasteful, it still remains wasteful, because the file tree has to be walked and directories as well as hard links created.
To reduce the number of immediately adjacent backups with different lifetimes, it might be good enough to create only the backup with the longest lifetime. In a hourly–daily–monthly scheme, this might go into /etc/cron.d/ [2]Using the full path to rsync-backup.sh
if it is not in your $PATH
.
# First day of month -> persistent 8 23 1 * * rsync-backup.sh /home /backup `date +\%Y-\%m-\%d` # Other days of month -> recycled next month 8 23 2-31 * * rsync-backup.sh /home /backup `date +thismonth/\%d` # Other hours of day -> recycled next day 8 0-22 * * * rsync-backup.sh /home /backup `date +today/\%H`
Please not the extra backslashes before the percent signs, as cron will change unescaped percent signs to newlines.
All operations will start 8 minutes past the hour (first field), feel free to place this at a time where your system is not loaded.
Every hour from 00:08 to 22:08, the hourly backup is run.
At 23:08 on the first day of the month, the persistent backup is run, on other days, the one which will be recycled. A common default setup on Linux systems is to have the daily and monthly cron jobs run in the early morning. I prefer running backups shortly before midnight, as the daily backup will be named after the day whose modified files it contains.
13 responses to “Simple versioned TimeMachine-like backup using rsync”
Hi, thanks for your post !
You have a possible race condition at the end of your script (which would create a
__prev
link in your backups), with the “rm
” and the “ln
“… You should think about using “ln
” and “mv
” (because “ln
” is not atomic):ln -sfd "$3" "$2/__prev.new" && mv "$2/__prev.new" "$2/__prev"
(The “
ln
” and the “mv
@vaab: You are right, this could be made cleaner. But then probably the “.new” would need to be replaced by “.$$” to make the intermediate link unique between two instances of the backup script terminating at the same time.
I do not consider the race condition critical, as two backups should not be running so close, that they will terminate at the same time (you probably have other problems with your machine, then, including performance).
Even if two concurrent backups terminate at the same time, it seems that the worst to happen is that the link points to the one which terminated just slightly earlier. This might lose some file space due to unnecessary copies the next time, but should not impact the correctness, does it?
BTW: Do you know what happens with
rsync
if the--link-dest
directory is replaced while the backup is in process? This might have more impact on concurrent backups…Hi Marcel,
Your script works great… However, I am trying to pull the backed up data from a remote source on my network using ssh in the Cron command:
and I get the following error:
My Key setup works fine, because I can otherwise use Rsync without any problem via the command line.
Do you have any idea how I could solve this issue? How could I ssh into the remote source without adding any argument that would conflict with rsync-backup.sh?
Should I add something to rsync-backup.sh?
Thanks
John,
I like your idea to use a remote source; very elegant!
From the error message, it looks like the
ssh
invoked byrsync
is requesting a password from standard input. Therefore, I guess it is not using your ssh key.My guess it is that the ssh key is only in
~john/.ssh
, not in~root/.ssh
. After asudo -s
,HOME
is typically still set to~john
.Does the key-based ssh login still work when run from
sudo -sH
?Besides copying your keys with e.g.
cp ~john/.ssh/id_* ~root/.ssh/
, you could also include a lineHOME=/home/john
in yourcrontab
(tilde expansion does not work here).Good luck!
-Marcel
Your guess was correct. I was using ssh but my key was in /.ssh so it could not be accessed by the cron job. I painstakingly set up host-based authentication so has to allow passphraseless SSH connections.
Now it works. Thanks again for all your sharing.
Hi Marcel,
Is there a simple way to hide the _prev link.
My network (Samba) users see it in their windows machine and this is a bit confusing.
I cannot figure out how to do this and whether what I am trying to do may mess up your system.
Thanks
John
John,
adding the following line to
smb.conf
in the appropriate section should do the job:hide files = _prev
Actually, I have a much bigger problem than the above issue.
Instead of creating links in the multiple generations, the system does full copies and my 3 TB disk is almost full.
I inserted the following in /var/spool/cron/root
10 23 1 * * /usr/local/bin/rsync-backup.sh /media/HHH /media/Time_Machine `date +\%Y-\%m-\%d` && chmod -R 1755 /media/Time_Machine #First day of month -> persistent / HHH
10 23 2-31 * * /usr/local/bin/rsync-backup.sh /media/HHH /media/Time_Machine `date +This_Month/\%d` && chmod -R 1755 /media/Time_Machine #Other days of month -> recycled next month / HHH
10 8-19 * * 1-5 /usr/local/bin/rsync-backup.sh /media/HHH /media/Time_Machine `date +Today/\%H` && chmod -R 1755 /media/Time_Machine #Other hours of day -> recycled next day / HHH
The script is exactly as entered in your post. Would you by any chance have an idea of what could be going on?
Thanks
John,
you are changing the files in the repository, so they no longer match the
rsync
comparison test.Changing the file modes in the source directory is not an option? (I am not aware of an
rsync
option to disable this comparison.)You are right, how could I have missed this.
Hi,
Thankyou for the script.
In testing I see this error
“–link-dest arg does not exist: backups//__prev”
when running “./rsync.sh data/ backups/ `date +%T`”
However if I check the link (ls -la) on the __prev file it is correct.
Any ides? Is this something I should be worried about?
Thankyou
Barbz,
could you please try the following? (I will try to reproduce it in the next few days)
./rsync.sh data backups `date +%T`
date +%H
instead ofdate +%T
)sh -vx ./rsync.sh data backups `date +%T`
and send me the outputBTW: You’re not running this on Windows (because of the colons)?
-Marcel
Hi Marcel,
Not on windows, was an issue with not using the full path.
Would you know how this could be adapted for pushing to a remote ssh server?
Pushing the files isnt an issue however Im struggling to get the symbolic link component to work.
That said Ill have a go with pulling them instead however I would prefer to push due to port forwarding requirements.
Thankyou