Keeping Your Life in Subversion
by Joey Hess01/06/2005
I keep my life in a Subversion repository. For the past five years, I've checked every file I've created and worked on, every email I've sent or received, and every config file I've tweaked into revision control. Five years ago, when I started doing this using CVS, people thought I was nuts to use revision control in this way. Today it's still not a common practice, but thanks to my earlier article "CVS homedir" (Linux Journal, issue 101), I know I'm not alone. In this article I will describe how my new home directory setup is working now that I've switched from CVS to Subversion.
Subversion is a revision-control system. Like the earlier and much cruftier CVS, its purpose is to manage chunks of code, such as free software programs with multiple developers, or in-house software projects involving several employees. Unlike CVS, Subversion handles directories and file renaming reasonably, which is more than sufficient reason to switch to it if you're already using CVS. It also fixes most of CVS's other misfeatures. Subversion still has its warts, though, such as an inability to store symbolic links and some file permissions, and its need for twice as much disk space as you'd expect thanks to the copies of everything in those .svn directories. These problems can be quite annoying when you're keeping your whole home directory in svn. Why bother?
Benefits of Revision-Controlled Directories
I see three main benefits of keeping my entire home directory in svn:
- home directory replication
- history
- distributed backups
The first reason originally drove me to using revision control for my
whole home directory. It's still the greatest benefit today. I have many
accounts on Unix machines scattered around my house, the country, and the
planet, and I have an abiding desire for every single one of these disparate
accounts to work and look exactly the same. I don't care if the machine I'm
logging in to is in Japan or the Netherlands, or a California colocation
center, or my home office; I don't care if it's a PC clone, or a Mac, or an
S/390 virtual machine; if it's not set up the same as all the others, if I
cannot concentrate on the important differences instead of being distracted by
the unimportant ones, then I will be less productive. The final
ingredient for configuration insanity is that I constantly tweak my setup. As
soon as I make an improvement, I want it to be available on every one of my
accounts, everywhere. Without Subversion, keeping all these accounts in sync
would be well-nigh impossible. With Subversion it's as easy as typing svn
up now and then.
|
Related Reading
CVS Pocket Reference |
It seems that the next big change in how we use computers might be the
introduction of filesystems that store every old version of every file. With
the explosion in size of cheap hard disks, there seems to be no reason not to
keep a complete record of your computing life--and several research projects
are working on it. Meanwhile, I've done just that for five years, using first
CVS and now svn. It's amazing to check out my home directory as it looked on
New Year's Day 1999 and play around in it. It's neat to be able to look at the
entire revision history of my .procmailrc and watch as I moved mail
around, dealt with a growing spam problem, and joined and left many mailing
lists. It's handy to be able to run svn diff on my kernel config
file to see how make xconfig changed it. I can recover files that
I've deleted, or delete files because they're not relevant right now, and know
I've not really lost them at all. Amazingly, my Subversion repository is only 4GB in size even with all this historical data.
I have not lost a file since 1999, and I don't intend to ever again. Take
one crucial file, like my resume or sent-mail archive. I have a copy of that
file on my desktop computer in the .svn directory. There's another
copy on my home directory on my laptop, and yet another copy in the Subversion
repository on my server thousands of miles away. People tell me that the best
backups take no effort--so you actually do them--and are widely scattered
among many machines and a lot of area so a local disaster won't knock them out; additionally, they are tested on a regular basis to make sure the backup works. I'm
doing all of these things, as a mere side effect of keeping it all in
Subversion. To complete the picture, I only have to take very careful backups
of my Subversion repository itself. The automated distributed backups via svn
keep me sleeping quietly at night. I know that no matter what I do, my life
will still be there, safe and secure in svn.
At this point I should fess up to my dirty little secret: not everything is
in svn after all. My full home directory with all the trimmings often runs to
dozens of gigabytes. Much of that is collections of music files and
documentation, which I have not yet dared to check into svn and which I rsync
between computers. As disk sizes continue to grow, it looks more and more
likely that I will take the plunge soon and check these large file libraries
into svn too. Then too I have the occasional file, such as a disk image for a
virtual machine, that is too large and too much bother to check into svn. I don't
keep incoming mailboxes in svn, because that would lead to a merging nightmare; instead I use offlineimap to synchronize them between several
computers. A cron job does check in the mail archives. A few other missing
corners are my web browser cache, which I would love to have a history of,
and my temporary directory, which I'd rather not.
I have made some progress recently in moving more into svn. I've
managed to check the /etc directories of several machines into svn.
While this is of questionable value as a way to replicate those machines, and
it doesn't include some files such as /etc/shadow, it's useful to be
able to check old versions of config files. I've also come up with a way to
check crontabs into svn. This is a great improvement, because I can edit and
view any machine's cron jobs from anywhere and have all the history and backup
benefits of svn. I'm sure that my use of svn will only increase as I find ways
to use it in the odd little corners that remain. Yesterday I even found myself
checking baby photos into svn for my family's web site.
Directory Organization
I speak of my svn repository, but I actually have several repositories.
First, the public one holds most of the nonprivate parts of my home directory
and lots of software projects. You can even browse the contents of this
directory on the Web at svn.kitenet.net, or check it out
anonymously from svn://svn.kitenet.net/joey/. Next I have a
private repository that holds such things as my email archives, and I have
several other small, special-purpose repositories. I also work on other projects
themselves kept in svn on other servers. A full checkout of my home directory
will include parts from all of these repositories; the
svn:externals feature of svn lets me knit them all together into a
whole that I can check out or update with a single command.
I've always managed my home directory with an iron hand. Keeping files in revision control has only exacerbated this tendency. Let's look at the top level:
joey@dragon:~> ls
Maildir/ bin/ doc/ html/ lib/ mail/ src/ tmp/
That really is everything, except for 100-plus dot files. Most people use their home directory as a cluttered scratch space for files they're working on. Subversion works better for this than CVS does, because it lets you easily rename and move files and directories. My tightly controlled home directory is partly personal preference and partly a leftover from my days as a CVS user. Keeping a home directory in Subversion does encourage some neatness, as svn will complain about files that it does not know about. This encourages keeping things organized, or at least out of the way in a temporary directory.
Because my home directory is publicly available online, I have to take care to keep private files private. One tricky area is private dot files. These need to be in my home directory, but I can't keep them in the public repository. To manage this, I keep all the private dot files in ~/.hide, which I store in an entirely different, private Subversion repository.
To make things work, there are symlinks from the private dot files in ~/.hide to my home directory. For this I have a svnfix program that symlinks them into my home directory, fixes some permissions and other symlinks, and even updates my crontab from svn. I have to remember to run this program from time to time, or put a call to it in my crontab, because there is no way to add a client-side hook in Subversion, or CVS for that matter.
External Directories
My ~/.hide directory is just one of several Subversion repositories
that Subversion's useful svn:externals feature pulls into my home
directory. My ~/src subdirectory, which holds various code projects
I'm working on, is an even better example, as some of its contents come from
repositories shared with others.
joey@dragon:~> ls src
Words2Nums/ debconf/ filters/ packages/ sleepd/
alien/ debhelper/ flashybrid/ pdmenu/ tasksel/
apt-src/ debian-cd/ kernel/ sarge/ ticker/
base-config/ debian-edu/ misc/ secure-testing/ unreleased/
d-i/ dpkg-repack/ mooix/ skolelinux/ wmbattery/
joey@dragon:~> svn propget svn:externals src
mooix svn+ssh://svn.mooix.net/home/svn/mooix/trunk
debhelper svn+ssh://kitenet.net/home/svn/debhelper/trunk
tasksel svn+ssh://svn.debian.org/svn/tasksel/trunk
d-i svn+ssh://svn.debian.org/svn/d-i/trunk
base-config svn+ssh://svn.debian.org/svn/base-config/trunk
debconf svn+ssh://svn.debian.org/svn/debconf/trunk/src/debconf
secure-testing svn+ssh://svn.debian.org/svn/secure-testing
After I use the svn propedit command to add external
repositories, svn pulls them in and they become subdirectories in my home
directory. They mostly behave as if they were part of the same larger
repository. This is a great feature, with uses beyond including directories
from other repositories. On many of the machines I use, I don't need my entire
home directory checkout, and so my home directory is more minimal.
joey@elephant:~> ls
bin/ tmp/
I use this machine for occasional development. I don't fully trust the
machine, so I don't want to put private files there. I have a branch of my
home directory that (using svn:externals) includes only the basics
and is perfectly usable for everything I normally do on that machine. Using
svn:externals like this to pull in optional directories keeps the
part of my home directory that I have to branch (and merge) small.
New Machines
When I want to check out my home directory to a new account, I run one of these commands:
% svn co svn+ssh://joey@svn.kitenet.net/svn/joey/trunk/home-base .
% svn co svn+ssh://joey@svn.kitenet.net/svn/joey/trunk/home-full .
The first is the minimal version of my home directory; the other is the whole thing. The dots at the end of the command lines make svn check it out directly into my home directory.
Conclusions
I switched from CVS to svn over a painful couple of months in the winter of 2003. CVS had many misfeatures that made keeping a home directory in it annoying, and I'm glad that I don't have to worry anymore about picking file and directory names (svn can easily rename them; CVS couldn't); that svn can handle binary files well and efficiently (unlike CVS); that svn is quite a bit faster at updating large home directories than CVS is; that managing branches is so much easier with svn that I actually have some branches of my home directory; and that those annoying "CVS" directories that once cluttered up every corner of my home directory have gone away. The transition from CVS to svn would be easier today, as the conversion software has improved, but such a large conversion between revision-control systems is bound to be a slow and painstaking process.
It's interesting to consider that the longevity of my home directory's history has no ties to the useful lifetime of a given revision-control system, or even the lifetime a given computer platform. Converting repositories of past revision-control systems seems likely to be something new systems will continue to support. If one day I switch to arch, or a distant future relative of arch and svn, I fully expect to take my history with me when I do.
Before I go, I want to thank the hundreds of readers who responded to my original article about keeping my life in CVS. Thanks for your encouragement, for your ideas, and for letting me know I wasn't as crazy as I thought. Yes, as you can see, I finally have switched to Subversion! Now I'm off to commit this file....
Joey Hess has worked for nine years as a Debian developer.
Return to ONLamp.com.
You must be logged in to the O'Reilly Network to post a talkback.
Showing messages 1 through 24 of 24.
-
Checking out cvs in a subversion repository
2005-03-10 07:19:33 contact@mann.fr [Reply | View]
Would you do something like this ?
cvs co :pserver:joeyh@example.com:/var/lib/cvs cvsproj
svn add cvsproj
svn ci -m "Hey I'm adding a checked-out CVS dohicky !"
How do you handle cvs externals ???
-
Risk of Changing Connectivity
2005-02-13 15:34:26 alastairrankine [Reply | View]
I've also been tinkering with subversion to store my home directory. One of the issues I've come across is that getting a repository set up that you can access from everywhere, is appropriately secure, AND is maintained (ie backed up, powered up, etc) is not a simple task.
It's something to keep in mind when you're tempted to keeping your life into subversion.
-
Try "unison"
2005-01-18 12:06:37 wob [Reply | View]
Hi,
on my machines, I have been using unison to keep my home directory in sync over the last 3 years or so. unison is based on rsync for individual file transfers, but also keeps track of renames to minimize the amount of data to transfer over the net.
So far, it has never let me down, and has proved an excellent tool - one unison command and any two of my machines are in sync again. Since I sync at least once a day, the different homes never diverge.
Furthermore, there is no server. I imagine if your SVN server goes belly up, which is bound to happen at the most inconvenient time, you will not be able to keep your homes synced (using only svn) until you have re-installed it.
Finally, unison doesn't keep two copies of every file. Of course, you cannot "go back in time" with it like with svn, but I've never missed that feature so far.
-
Backups
2005-01-14 09:36:13 kroger [Reply | View]
Great article!
The first cvs article helped me putting my home in svn and this article has the extra tips I was needing. thanks!
I'd like to know how you backup your svn repository and what media (e.g. CD, DVD) you use.
-
/etc under revision control
2005-01-13 20:01:29 genehack [Reply | View]
Do yourself a huge favor and look into CfEngine. You can keep your master repository under revision control, automatically distribute changes, and you don't have to worry about hacking around the things you don't want to put in the repository. It will also handle multiple machines more easily.
-
Trust my "life" to berkeley db? No way!
2005-01-12 21:22:01 evgen [Reply | View]
You do realize that you are trusting berkeley db to maintain your "life", right? I cannot think of a better definition of digital russian roulette...
[Having just spent three days recovering a subversion repository that had a corrupted db due to bad RAM, you are one "svn: DB_INCOMPLETE: Cache flush was unable to complete" message away from going postal.]
-
What about file integrity checking?
2005-01-10 06:38:23 chris_marshall [Reply | View]
I use CVS to synchronize my home directories across several machines, as you do, although I only use it for synchronization, not history. Also like you, I manage larger collections of files (my music and pictures) via rsync.
The directory I manage this way is where I keep my most important text files. I call it sys-config because I started it as a way of storing the config files
I periodically wipe out my CVS repository and re-import sys-config because of how hard it is to rename files and directories and move them around. This doesn't bother me since I am not interested in keeping track of history for sys-config.
I am curious how you handle the threat of one of your hard disks rotting out from under you, especially for files you haven't looked at in years. How do you know they aren't decaying?
You need to automate the generation and checking of md5sums to fight that. I have toyed around with some scripts but not settled on one approach yet. You need a way to separate the files that are changing a lot from the ones that don't or you will generate a lot of false alarms (md5sum checks will fail for files that you edited and didn't recalculate the first time).
-
What about file integrity checking?
2005-01-12 12:44:12 joeyh [Reply | View]
This is a very good question.
First, note that subversion uses a db database with only a few files, not one file per file in the repository, unless you're using the new FSFS backend, with which I am not completly familair.
It would be nice if subversion kept a md5sum or something of what should be in the repository database files so it could detect small changes due to disk problems. Then you could detect repository breakage immediatly instead of continuing to use a broken repository. Unfortunatly, it does not do this.
What it does do is keep a md5 checksum of each revision of each file in the repository. It checks these checksums at various points during updates and so on, so if a file in your working copy gets corrupted it will detect that. And it will detect if a disk problem corrupts the content of a file stored in the repository. For example:
svn: Checksum mismatch on rep '1':
expected: a70149cb192b21fe371f05ca73e65416
actual: a2627e0896be2165de90823eaa240a56
So I don't consider svn's checksum coverage to be complete, but I am sure that I'm getting out the same files that I checked in. To protect against general repository corruption that is not caught by subversion's internal checks, you need to do backups of your repository, preferably using something like svnadmin dump or hot-backup.
Personally, I do offsite incremental backups of my svn repository using duplicity; these end up gpg encrypted and sha1-summed, and are written to reliable media.
-
When to commit
2005-01-10 05:38:05 Manuzhai [Reply | View]
So do you have any rules for when you commit the latest version of, say, your documents?
I got into Subversion a few months ago for some software development projects, and it got me to think about stuffing more of my things in a repository, but your article really got me to do so. Thanks for that! -
When to commit
2005-01-12 12:46:04 joeyh [Reply | View]
I generally commit something as soon as I realize I'd not want to type it in again. Although it really varies by project; projects that have more than one person working on them benefit from more occasional commits that are tested and working, so if I need to make intermediate commits when working on such a project I'll typically spin off a temporary branch to work in.
-
Old garbage?
2005-01-10 01:02:13 Eigir [Reply | View]
For ten years ago, I would probably try to achieve the same as you have. Back then, I saved every email I received, every email and news posting I did and every little file that passed through my home directory (I also did quite a lot of programming at that time).
Eight years ago, I lost it all. Not due to a disk crash or anything like it. Nope, user error all the way. I was devastated, and unable to do any work for the next couple of days. But life goes on, and I had to say to myself that it's only bits and bytes, and nothing important.
Today, I could not care less. What I have learned from that experience (and a few similar ones) is that you rarely search through old stuff to find important information. Every once in a while there is something you should have taken care of, that you deleted, but nothing that is really really important (we often make stuff to be more important that it really is).
These days I regularly clean out my home directory, deleting old emails (both received and sent), scripts I no longer need and other similar stuff. Every time I change a job, I completely wipe out my home directory. That sound like a nightmare to you I guess? :-)
-
Subversion cannot handle ~100 MB files
2005-01-07 14:39:55 sanchonevesgraca [Reply | View]
You expect to commit more and more of your home directory to Subversion repositories. Currently, Subversion cannot handle files with size of the order of 100 MB in a realistic time interval (files of this size are seen in image processing projects). Checking out a few large files locally takes about 10 minutes but over HTTP from the Subversion server to a client machine takes hours. -
Subversion cannot handle ~100 MB files
2005-01-07 15:52:28 joeyh [Reply | View]
As a computer programmer, I don't generally edit large files. While I do have several gigabytes of data in svn, it's in hundreds of thousands of small files, with a large file being on the order of 5 or 10 mb. With this workload I find svn to be reasonably fast, even if I'm doing an update of my entire home directory. These is some overhead compared to rsync, but I can live with that.
Anyway, I did some benchmarking here with 200 mb files containing random data, and it takes me 11 minutes to check such a file in locally to a repository checked out with file://, 10 minutes to check such a file out over svn+ssh:// to a remote machine over 802.11b wireless, 9 minutes to scp the same file over the same link, and 7 minutes to check it out locally. These don't seem to match your numbers, but then you're using the http:// protocol which is probably much less efficient and anyway the network operations appeared bandwidth/disk-bound to me.
Hope this helps, but you can probably get better information on subversion scalability with larger files on one of the subversion mailing lists. -
Subversion cannot handle ~100 MB files
2005-01-08 06:59:16 sanchonevesgraca [Reply | View]
Large files are an issue also for programmers: consider the scenario of checking out large project dependencies such as libraries. The HTTP protocol is widely used by Subversion projects, since it allows connection to repositories through firewalls. So the limitation I pointed out is not a minor problem. The long checkout times I observed occurred over HTTP in internet and intranet connectivity so the limitation was not because of the network bandwidth nor can it be explained by the HTTP protocol. I also noticed that the svn client would, after about ten minutes, take all the computer processor capability and the transfer speed would be quite small (~5 KBps). I do not know the internals of the Subversion checkout implementation, but these observations suggest that the design does not scale up to this file size. Others have encountered this limitation (see http://svn.haxx.se/dev/archive-2004-07/0801.shtml).
-
symbolic links
2005-01-07 05:48:34 timfreeman [Reply | View]
Great article.
One comment, quoting from
http://svnbook.red-bean.com/en/1.1/svn-book.html
New in Subversion 1.1
Symbolic link versioning
Unix users can now create symbolic links and place them under version control with the svn add command. See svn add and the section called svn:special.
-
symbolic links
2005-01-07 12:08:25 joeyh [Reply | View]
I'm looking forward to using the new symlinks features in 1.1, once all the systems I use subversion on have updated their client to the 1.1 version that's needed to use the feature. Now if subversion could only add support for storing full file permissions in svn, I could probably do away with my svnfix hack entirely.
-
tla
2005-01-06 21:05:33 ciclouseau [Reply | View]
Have you considered Arch, and if so, why did you stick with Subversion? -
tla
2005-01-10 02:41:31 jmtd [Reply | View]
Same question again, replacing 'Arch' with 'darcs', then 'monotone'... -
tla
2005-01-07 12:10:45 joeyh [Reply | View]
In fact I have some friends who do the same thing with arch. Subversion was an easier transition for me as a cvs user; the full pros and cons are too long to list in this space. The most interesting thing to me is something I touch on in the article -- if I do decide a better system than subversion has come along, it looks very likely that I'll be able to convert my whole repository history to it. -
tla
2005-10-02 21:14:00 akkartik [Reply | View]
I recently started using darcs for something like this. Instead of a single huge repository, though, I maintain multiple ones to taste. This design decision is based on darcs/tla being organized by changeset rather than file-revision. It just seems ugly to clutter up individual patches with a bunch of unrelated stuff.
While I think the decision was forced by my choice of vc software, there are tradeoffs to be considered if I were to be freed from this constraint.
Also, while off-the-shelf vc software does pretty well, there are some issues to consider. The big one I think about is archival. One's home directory repo will tend to grow linearly over time. Is there some way one can archive old snapshots on a reliable medium and then remove them from the repo?







The thing that gives me the most pause is the need to svn add, svn move, and so on rather than just moving things around as I normally do. It seems like this could be dealt with through some tool (if such existed) that detects moves, copies, and so forth in your workspace and keeps svn in the loop. Do you have any thoughts about this?