Git binaries: from hard links to symbolic links

2 minutes, 8 seconds

When compiling and replicating a git installation elsewhere, I realized that most (but not all) git's binaries were hard links to a common binary. That means that where an installation weighs 17MB, it might end up weighing 250MB. Not nice for a DVCS that claims avoiding bloat.

This happens because few replication utilities keep hard links (rsync has an option for it), and even if it does the target filesystem might not support hard links. So what's left to do ? Actually, it is simple enough: when compiling git, telling it to use symbolic links instead of hard links.

Go look into the Makefile, and replace the following lines ("- line" is replace by "+ line"):
- $(QUIET_BUILT_IN)$(RM) $@ && ln git$X $@
+ $(QUIET_BUILT_IN)$(RM) $@ && ln -s git$X $@
- ln -f '$(DESTDIR_SQ)$(bindir_SQ)/git$X' \
+ ln -sf '$(DESTDIR_SQ)$(bindir_SQ)/git$X' \
- $(foreach p,$(BUILT_INS), $(RM) '$(DESTDIR_SQ)$(gitexecdir_SQ)/$p' && ln '$(DESTDIR_SQ)$(gitexecdir_SQ)/git$X' $(DESTDIR_SQ)$(gitexecdir_SQ)/$p' ;)
+ $(foreach p,$(BUILT_INS), $(RM) '$(DESTDIR_SQ)$(gitexecdir_SQ)/$p' && ln -s 'git$X' $(DESTDIR_SQ)$(gitexecdir_SQ)/$p' ;)

If the makefile show slightly different lines, you get the idea: look for "ln" in the Makefile and add the "-s" option. Now the compilation/installation will produce symbolic links. Note that the last change is a small optimization so as to point to git in the same directory instead of using an absolute link.

So far it works OK, though I am clearly not a Git guru.

Commentaire n°1

Isn't it so that hard links link several directory entries to one single inode? In which case it doesn't matter how many links you have, the data is not replicated. ls -l will show the file size several times (once for each directory entry), but du shows the space that is really used. For example:

$ dd if=/dev/zero of=aaa bs=1M count=10 $ ln aaa bbb $ ls -l total 20512 -rw-r--r-- 2 mweber mweber 10485760 2008-06-25 10:44 aaa -rw-r--r-- 2 mweber mweber 10485760 2008-06-25 10:44 bbb $ du -s . 10260

Or did I get something wrong?

Commentaire n°2

Yup, I know that. The keyword in the article is replicating. The git binaries were put on my hosting, and my hosting is backed up through unison. While the installation took only 17MB on my hosting, backing it up ended up with something like 88 times the same binary with a different name (because unison can't do hard links) in the backup. Hence the use of symbolic links to lighten up the burden of the replication :-P

Plus "stat" is a better to see whether hard links exist for a file (in the "Links:" entry).

Previous Next