Duplicity and the Rackspace CloudFiles API for backups
I've been playing with RackSpace Cloud lately and so far I've been quite impressed. The price is quite competitive, the network seems quite stable and performance is no issue. Plus: persistent storage in the cloud, out of the box! Awesome.
The support has been good (I had a routing issue on privatenet interfaces between two servers, which I was certain was either at the network or hypervisor layer and not my firewall. The engineers and I ran through the usual tests til it was assumed a hypervisor routing issue with this particular guest, which a reboot of the guest fixed).
One of the most impressive features I've seen is twofold:
a) The ability to run scheduled snapshots of your guest with up to three free images stored 'with server' (i.e, if you delete your actual server, you'll lose the images too. But hey, this is free, and perfect to restore from if you just perform that dreaded rm -rf / . It's also perfect for me when I want to reset my dev server to a vanilla Aegir installation for quick testing). I can schedule daily snapshots and a weekly snapshot, with one spare space for ad-hoc.
b) Alternatively to a), you can store the images on Cloud Files, which is separate from your server disk image itself and can be retrieved even if you delete your entire server. This costs per the CloudFIles cost of something like US 15c / GB / month
I am not storing my server images on CloudFiles, and I've always kind of ignored this aspect of 'cloud' services that various businesses provide.. as I felt I never had a real use for it. Then I realised that there was an API, and that I could actually schedule filesystem-level backups from my servers themselves to push to CloudFiles at drastically low cost, automatically.
Previously, I've had a server that predominantly existed to store backups of the other servers. This server was costing me a good $35 a month or more. With only a few GB of data I need to backup, I could reduce this to something like $2 via CloudFiles if you include the small charges for inbound/outbound bandwidth and GET/PUT requests.
So I installed the python-cloudfiles API:
cd /usr/local/src
git clone git://github.com/rackspace/python-cloudfiles
cd python-cloudfiles
./setup.py install
I decided to use Duplicity, which is an awesome backup tool for Linux closely related to rsync and rdiff-backup. It uses the rsync libraries to perform full and incremental backups of data and is very bandwidth efficient.
Rather than just mirror the files elsewhere, it uses GPG to encrypt and sign arrays of tarballs of the data as .gpg files. This means while I am putting them up on CloudFiles, and have my containers set to 'private', if anyone ever gets in there, they'll need my passphrase to decrypt the archives.
I added the lenny-backports to my sources.list to fetch a newer version of Duplicity which works with CloudFiles and stores the archives in 25MB chunks rather than 5MB.
apt-get -t lenny-backports install duplicity
After creating a container in CloudFiles and fetching your API from the control panel if you don't already have it, you can create a simple script likeso:
#!/bin/bash
# Backup to CloudFiles
CLOUD_CONTAINER="myserver_backup"
export CLOUDFILES_USERNAME=mig5
export CLOUDFILES_APIKEY=1234567890abcdefghijklmnopqrstuvwxyz
export PASSPHRASE=uh-huh
options="--exclude-other-filesystems --exclude /tmp --exclude /dev --exclude /proc"
duplicity $options / cf+http://${CLOUD_CONTAINER}
This will backup your entire filesystem of a basic system minus any complicated filesystem setup, and dodging some of the volatile or pointless areas of your server that shouldn't get backed up.
I panicked about the 'http' reference, not wanting my passphrase or anything volatile getting sent over unencrypted channels. Fortunately despite this, duplicity uses HTTPS if it can by default anyway, so it's all good. And the passphrase is only used in the pipe to gpg on the local system prior to pushing to the cloud. This passphrase, via gpg, is put through a file descriptor so it's not exposed in your running processes anyway.
Get cronning!
0 1 * * * root /usr/local/bin/backup_to_cloud | mail -s "Daily backup: `hostname` `date`" <a href="mailto:you@example.com">you@example.com</a>
From a Linode with about 1.5GB of data, I pushed the entire system to CloudFiles in about 11 minutes.
Duplicity has an awesome number of features and options you can pass it. 'duplicity full' and 'duplicity incremental' are both actions, but they are inferred above as a full backup and will be followed by incrementals if a previous full backup was discovered.
But, I'm a sysadmin, and I've learnt the hard way that it isn't enough to be *running* backups. It is always necessary to actually test that they will help you in a disaster.
Enter two duplicity commands: 'duplicity verify' and 'duplicity restore'.
Both actions work by switching the order of the source and destination in your execution. For instance, instead of this:
duplicity / cf+http://${CLOUD_CONTAINER}
Running the following will imply a 'restore' (without you having to even specify 'restore'):
duplicity cf+http://${CLOUD_CONTAINER} /path/to/restore/dir/
Verification is the same command as restore but with 'verify' as an explicit statement:
duplicity verify cf+http://${CLOUD_CONTAINER} /path/to/restore/dir/
edit: it should be noted that you can't restore to a directory that exists, like / . It seems with duplicity, you have to restore to a new subdirectory, or see if --force gets you anywhere.
Verify is similar to 'diff' in a basic way: rather than actually transfer any files, it compares the state of the remote archives against the current filesystem and outputs what is different.
More awesome syntax is available for restorations with duplicity, such as the ability to restore only a single file with --file-to-restore, and being able to restore from a specific point in time (three days ago: -t 3D )
Along with the compression that the archives make, this has provided me with an extremely efficient, reliable, secure and *cheap* backup solution for my servers (not just my cloud servers but my Linodes as well)




Comments
Peter L
Fri, 16/04/2010 - 01:33
Permalink
Great read…..the information
Great read…..the information is poignant very interesting please keep them coming !!
Alex
Wed, 13/04/2011 - 05:46
Permalink
Impressing article..... Do
Impressing article..... Do you think that this is also possible on a windows OS? I'm looking for a solution for making a backup of my windows workstations to rackspace. It would by higly appreciated if you have a suggestion..
best regards,
Alex.
mig5
Wed, 13/04/2011 - 07:31
Permalink
Have a look at Duplicati,
Have a look at Duplicati, which is a Duplicity GUI that works on Windows.
Add new comment