5

I ran the following command:

git add . The since there are many files (> 10TB), so it was taking time to add. Halfway through I accidentally deleted some files (which I need to recover). So, I if I do "Ctrl + C" in terminal (interrupting Git).

What happens?

  1. There will be partial add in git (and I can recover some files using git checkout?)
  2. There won't be any files added in git.

Thanks,

nilesh
  • 327
  • 3
  • 16

1 Answers1

6

In my experimentation, an interrupted git add leaves nothing in the index (you can confirm this by using git status), but it does keep any blobs it created in its internal structures, so there may be some files that can be recovered. Have a look at https://git-scm.com/book/en/v2/Git-Internals-Git-Objects for information on extracting files from Git's internal blobs.

I recovered one file this way:

> find .git/objects -type f
...
.git/objects/ac/a6e96aaf9492a2ee8f9ef51f0197ad56436fd4
...

> git cat-file -p aca6e96aaf9492a2ee8f9ef51f0197ad56436fd4 > file1

Note that the bloc ID is the directory name ac plus the blob file name a6e96... to make aca6e96....

This way Git gave me the contents of one file. This is not going to be fun to use, though, because you get the file contents without the file name. Unfortunately, the file name would have been stored in the index, and in more durable structures if you had had a chance to do the commit, but that information would not be available yet for blobs created during an interrupted git add.

Here's a script that will list all your blobs in one file with separators, which might make your life a bit easier:

File list-blobs.pl:

#!/usr/bin/perl

open BLOBS, "find .git/objects -type f |";
while (<BLOBS>) {
   chop;
   s#.*(..)/#\1#;
   print "BLOB $_\n";
   system("git cat-file -p $_");
}

Run

chmod +x list-blobs.pl    
list-blobs.pl | less

and you will see what objects Git has actually stored in blobs before you interrupted git add.

Easier yet: use https://github.com/ethomson/git-recover shared by its author Edward Thomson in the comments below.

joanis
  • 10,635
  • 14
  • 30
  • 40
  • Actually, I have deleted files from local, so was wondering if its possible to recover from git. `git status` shows nothing added yet, as `git add` is still running and I suspect it will take long time due to files being > 10 TB. – nilesh Jan 17 '19 at 13:57
  • If `git status` does not show them and the files are not there locally, Git cannot help you recover these files anymore. It's good with committed files, and if `git add` had succeeded, it would still have a copy, but in your case I'm afraid you're out of luck. – joanis Jan 17 '19 at 14:09
  • 1
    Well, what I said is not quite right: I tried your case, and the .git directory gets bigger when a start and then interrupt a `git add .` operation. So the blobs have been created and stored in `.git`. Can they be recovered from there? I think we need a deeper Git expert to answer that, but maybe. – joanis Jan 17 '19 at 14:13
  • thanks yes the case you mentioned is correct, can you try 'git checkout' on some specific file after interrupting `git add` that will give the answer i guess. – nilesh Jan 17 '19 at 14:17
  • 1
    I'm spending too much time on this because it's interesting... Just added some info on examining the blobs stored in `.git/objects`. – joanis Jan 17 '19 at 14:35
  • You can use https://github.com/ethomson/git-recover to identify blobs that were added to the object database but did not get added to the index or committed. – Edward Thomson Jan 17 '19 at 14:37
  • Thanks a lot, before interrupting I tried `git cat-file -p c564c73ea1ee4d3f377ad71934fe8f61ac0677` (also one few other objects) but it is giving error, `fatal: Not a valid object name c564c73ea1ee4d3f377ad71934fe8f61ac0677`. Is it due to `git add` still running? – nilesh Jan 17 '19 at 14:39
  • 1
    I had that problem when I first tried using `cat-file`, and in my case it was because you have to take the two character directory name and add it to the beginning of the blob file name. So `.git/objects/ac/1234...` is blob `ac1234...`. Indeed, your string has only 38 characters but blob IDs have 40, so you have the same problem. – joanis Jan 17 '19 at 14:41
  • @Edward Thomson: thanks for the pointer to git-recover, this is a very nice tool! – joanis Jan 17 '19 at 14:43
  • @EdwardThomson thanks I tried `git-recover` using command `cd _tmp && git-recover --full` but it was unable to recover anything ```notice: HEAD points to an unborn branch (master) notice: No default references git-recover: no recoverable orphaned blobs.``` – nilesh Jan 17 '19 at 16:51
  • Neat, thanks. It looks like I'll have to do some work to make it work with an unborn HEAD branch. – Edward Thomson Jan 17 '19 at 17:02
  • This is correct: `git add` starts by creating a new index file (named `index.lock` to indicate to other Git commands that someone intends to replace the old index file, i.e., the old index is now locked against updates). Then it populates that new index file by copying the old one and, one file at a time, updating each added blob. When it is all done it uses a file-system-atomic `rename` operation to discard the old index in favor of the new one and release the lock, all at the same time. If you interrupt the add, it simply deletes the `index.lock`. – torek Jan 17 '19 at 19:19