Open Source Software Projects
I've got several Open Source software projects I'm either working on or have plans to work on sometime in the future. When they reach some level of functionality, I'll post them on Freshmeat. Until then, I'll describe them here and post the any useless, unstable versions. :-)
The As Yet Unnamed Backup Program
I currently have seven physical computers and at least four virtual computers (see VMware) that I wish to back up. Add to that about four other machines out on the Internet that have data I want backed up. I figure the total amount of disk space I have that may need backing up is just under 350 gigabytes. I've used programs such as Arkeia from Knox Software. I liked Arkeia, but found it lacking on a few issues:
- It's not Open Source.
- While they do have a home license, it is limited to two machines (a policy that doesn't appear to have any teeth in the software). I priced a four machine solution and the response was ridiculous!
- It only backs up to tape. The tape I have is a 4 gigabyte (8G compressed) QIC drive. The tapes are around $30 each. Even with 2-to-1 compression I'm looking at 44 tapes for $1320 to back everything up. Also, tapes go bad after a while.
- Two different times something went wrong during a backup. I had to restart the server as well as all the host programs. The database got confused and insisted I was still running a backup. I had to completely blow away Arkeia, reinstall it, and reconfigure everything to get it working again. I'm sure it was some simple thing that was keeping it from working, but since it's not Open Source, I couldn't dig into it to find out.
I looked at Amanda but found that it also seemed to eventually back up to tape and support for backing up Windows(TM) systems was limited to SMB mounts. I did like the concept Amanda has of storing information into a large staging area before dumping it to tape.
So, unless I've missed something that's out there, I'll have to roll my own. Here are the requirements for my system:
- Open Source, possibly GNU license
- Multi-level client-server model much like Arkeia. Each machine to be backed up will have a daemon running on it that will send data back to the central server. The central server will accept incoming data and prepare it for final archiving (see the redundancy reduction part below). Other processes will handle the temporary storage areas on (possibly) different machines.
- There will be client (GUI and maybe CLI) programs that talk to the central server to select files to backup, add filter rules, monitor status, etc.
- Satellite servers will act as central servers on remote systems on the Internet or in other remote locations. The Satellite servers will store data on something such as a Jaz or Zip disk, or on the hard drive of a laptop. Once these archives are brought back to the central server, this data will be incorporated into the backup.
- The final storage will remove redundant files. By comparing the digital fingerprints of the files backed up (MD5 checksum), the system will keep only one copy of the actual data in the archive. This keeps multiple copies of files on the same or different computers from bloating up the backup.
- Metadata such as the host the file came from, filename, access and modification time, ownership, permissions, etc. will be stored separately from the data. This will allow the redundancy reduction to work even if the files have different names.
- Must be able to backup Windows(TM) sytems.
- Must be able to restore hard links properly on Unix filesystems
- Files in the final backup will be compressed to save space.
- Any drive areas used as staging areas or final backup areas can be assigned a maximum amount of space to use, a minimum amount of free space to leave behind based on absolute amounts or percentages of the total. Server programs will gracefully handle running out of space.
- Must have a baseline or snapshot mechanism, so a baseline backup of a particular system can be extracted from the backup and stored to tape, removable disk, CD-R(W), or writeable DVD. Future incremental backups can be done against these snapshots.
- Since hard drives are dirt cheap, the primary storage area of the backups will be hard drives which are probably mounted in removable drive racks. Jaz discs could be used, but since Iomega hasn't seen fit to drop the price of the cartridges, they make a hellishly expensive storage option.
- The various programs should be secure and allow authentication to make sure the daemons on the various hosts do not hand over files to spoofed servers. SSH tunnels can be used for connections across untrusted networks.
- All data and metadata must be recoverable from safely stored archives even if all the servers involved are lost due to a catastrophe such as fire.
So, does this beast seem ambitious enough? Wanna help me write it? I've recently take up learning Python and have decided I really like this language. I will probably write it all in Python unless performance becomes an issue.
The Also Unnamed Home Automation
System
I'm going to skimp on the description of this for now. It will be similar to MisterHouse but I would like to write in in Python. I will write up my reasons for wanting to roll my own at a later time.
