Lost in the Git Orcus (Part 2)

UPDATE: There is a new post about a much improved version of this script: Lost in the Git Orcus (Part 3)

Find lost files deep deep in the catacombs of Git

Last year I posted a script to find lost files deep in the .git/objects folder.
But the way I chose was not the smartest:
It scanned all tree objects for a given file name (or pattern) and searched for commits pointing to this tree.
But since only the root tree has a commit pointing to it the script could only find files in the root folder.

That was dumb from my side. So I fixed it and made it much simpler:

  • It searches for trees or blobs containing the given pattern (it could also search for commits or tags if you wanted to)
  • It it finds a blob it just logs it out
  • If it finds a tree it creates a commit pointing to it via 'git commit-tree' and a branch pointing to this tree
Searching for the contents of blobs may be necessary. E.g. if the file you are searching for never made it into a commit, but was part of the index (and therefore could be somewhere in the object store if the gc has not already removed it).

The script

The script has 3 parameters:
  1. The object type to search in. Can be 'tree', 'blob', 'tag' or 'commit'
  2. The pattern to search for
  3. 'y' if you want to create commits and branches (only when type='tree')
The script is super simple, has no bells and zero whistles. So feel free to --amend it.
You can also find it on Github (in its newest version)

Possible improvements

Galore!
Just some examples:
  • Make it more convenient: Use options, print a help text, ...
  • Search also in the pack file
  • To make it faster: Search at first in the reflog of the HEAD and the branches
  • ...
I will implement the second bullet above soon. Stay tuned!
#!/opt/local/bin/perl

my $type = $ARGV[0]; # 'tree' or 'blob'?
my $pattern = $ARGV[1]; # filename to search for as regex
my $createBranches = $ARGV[2]; # 'y' if you want to create branches (only when search for 'tree')

##################################################################
# find all objects (blobs, trees, commits, tags) in .git/objects
##################################################################

my @AllFiles = `find .git/objects/`;
my @AllSha1 = ();

for my $object (@AllFiles) {
	if($object =~ /([0-9a-f][0-9a-f])\/([0-9a-f]{38})/) {
		my $sha1 = $1 . $2;
		push(@AllSha1, $sha1);
	}
}

##################################################################
# find all trees or blobs containing the pattern
##################################################################

my $ctr = 0;
for my $foundSha1 (@AllSha1) {
	chomp $foundSha1;
	my $t = `git cat-file -t $foundSha1`;
	chomp $t;

	if($t eq $type) {

		my @Lines = `git cat-file -p $foundSha1`;
		for my $line (@Lines) {
			if($line =~ /$pattern/) {
				$ctr++;
				printf "found $type: %s\n", $foundSha1;
				if($type eq 'tree' && $createBranches eq 'y') {
					my $branchName = sprintf "found/%03s", $ctr;
					printf "created branch: %s\n", $branchName;
					`git commit-tree $foundSha1 -m "found $pattern" | xargs -I{} git branch $branchName {}`;
				}
				break;
			}
		}
	}
}