1

I have git repository A that uses B as a submodule.

B's history has been rewritten after an LFS migration, but I would love it if A could still have its entire history functional. After the LFS migration, I do have a mapping OldSHA1 > NewSHA1 for submodule B, and now I just want to rewrite OldSHA1 gitlinks to NewSHA1 in repo A.

I have tried to run a filter-repo command on the repo A with a full OldSHA1==>NewSHA1 mapping as parameter but it doesn't seem to pick up gitlinks.

I also tried filter-branch as detailed in this thread Repository with submodules after rewriting history of submodule that seems to be looking for the exact thing I am trying to accomplish. I tried doing this with a single OldSHA1=>NewSHA1 mapping, and here's the command I am trying to run:

git filter-branch --commit-filter '
  if [ "$GIT_COMMIT" = <OLDSHA1> ];
  then
    cd <SUBMODULE_ABSOLUTE_PATH>;
    git checkout <NEWSHA1>;
    cd ..;
    git add -u;
    git commit -m "updated gitlink";
  else
    git commit-tree "$@";
  fi' HEAD 

But I keep getting the following error:

fatal: reference is not a tree: <NEWSHA1>

Somehow, git checkout doesn't seem to pick up the tree of submodule B. I even tried to specify a path with git -C AbsolutePathToSubModule checkout but I get the same error.

So, a few questions:

  • Is there something obvious I'm doing wrong here?
  • Is there a better way of accomplishing this? It seems like I "simply" want to replace a string with another somewhere in the object database, but I can't find a simple way to do that
  • Is there a way to do this on the entire repo like filter-repo does? Or should I run this on every single branch.

Thanks for any help, advice, clue about how to accomplish this!

Edit 1:

After an answer in the comments, I edited my script to this:

git filter-branch --commit-filter '
  if [ "$GIT_COMMIT" = <SpecificCommitID> ];
  then
    git update-index --add --cacheinfo 160000,<SpecificNewSha1>,<SubmodulePath>;
  fi
  git commit-tree "$@";
  ' HEAD

But it has no effect :(

WARNING: Ref 'refs/heads/develop' is unchanged

Edit 2:

Thanks a lot to user @torek! This is a snippet to help anyone get started:

git filter-branch --index-filter '
if [ "$(git rev-parse --quiet --verify :<SUBMODULEPATH>)" = <OLDSHA1> ];
then
  git update-index --cacheinfo 160000,<NEWSHA1>,<SUBMODULEPATH>;
fi' HEAD --all

From then, you have to loop over all OLDSHA1/NEWSHA1 pairs, or use a case) dictionary as depicted in their answer below

Thanks again a lot!

Bibzball
  • 303
  • 1
  • 7
  • For filter-branch what you want is to update the index directly, which will be a big pain. I'm not sure if filter-repo has any existing Python function to do this for you but if so, it will be much easier, and if not, it's an obvious feature request... – torek Sep 26 '22 at 13:33
  • If you do want to do this in `git filter-branch`, remember that you don't have the submodule at all, all you have is the gitlink. You must inspect the hash ID of the gitlink (stored in Git's index in the given path name: use `git rev-parse` to retrieve it from the index) and if it's one of the ones to replace, use `git update-index` to shove the corrected gitlink into position. The rest, the `git filter-branch` code will handle on its own. – torek Sep 26 '22 at 13:34
  • Hi @torek and thank you for your comments! I'm pretty new to the whole filter-branch thing and am struggling a bit, could you elaborate on how git update-index works? I found something like`git update-index --cacheinfo 160000,,` but i'm getting `git update-index: --cacheinfo cannot add ; cannot add to the index - missing --add option? ` when I do it in my script. And that still triggers with ---add – Bibzball Sep 26 '22 at 14:06
  • My bad, i typoed -add instead of --add. The final result is still not the desired one but I'm getting closer. Will write back as I fiddle with it. Thanks! – Bibzball Sep 26 '22 at 14:13
  • I updated my script in an edit at the end of my original post, baffled by how few examples of usage I can find online :( If you have any time to help, that would be amazing. Thanks! – Bibzball Sep 26 '22 at 14:42

3 Answers3

2

This:

git filter-branch --commit-filter '
  if [ "$GIT_COMMIT" = <SpecificCommitID> ];
  then
    git update-index --add --cacheinfo 160000,<SpecificNewSha1>,<SubmodulePath>;
  fi
  git commit-tree "$@";
  ' HEAD

is not what you want as it tests the hash ID of the superproject commit. You need to test the hash ID of the submodule commit in the index entry, e.g.:

if [ "$(git rev-parse --quiet --verify :SubmodulePath)" = oldhash ]; then ...; fi

and of course that has to test all the old rewritten submodule hash IDs to run them through the mapping function.

(This will definitely be easier in filter-repo where you can use a dictionary lookup.)


If you use:

sm_hash=$(git rev-parse :submodule-path)

or similar to prefix the test, remember to account for the cases where the submodule path is absent from the index so that :submodule-path does not parse properly. I think --quiet --verify will do the right thing here (produce no ouput quietly) but it's worth testing first.

Once you have the hash, you can do:

case $sm_hash in
old1) new=new1;;
old2) new=new2;;
...
oldN) new=newN;;
*) new=$sm_hash
esac

as a poor man's dictionary lookup with default, but you will want to skip updating the submodule hash if it's unchanged-or-empty.

torek
  • 448,244
  • 59
  • 642
  • 775
1

Easiest is going to be, with the old and new ids in a shamap file,

git filter-branch --setup '
        declare -A newsha
        while read old new; do newsha[$old]=$new; done <shamap
'                 --index-filter '
        if oldsha=`git rev-parse :submodulepath 2>&-`
        then git update-index --cacheinfo 160000,${newsha[$oldsha]-$oldsha},submodulepath
        fi
'

and if you're on a Mac you'll need to brew install bash to get past one of the problems in their neglected GNU install.

jthill
  • 55,082
  • 5
  • 77
  • 137
0

The comment using bash syntax, declare -A ..., will not work. git filter-branch is a Bourne shell script (see https://github.com/git/git/blob/a82fb66fed250e16d3010c75404503bea3f0ab61/git-filter-branch.sh#L1), and the Bourne shell does not have associative arrays.

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 14 '23 at 03:24
  • mmm, that's true on distros that hobble their default shell. Red Hat, SUSE, Slackware, Arch, Gentoo, and all distros built on those default to bash It's pretty much only Debian-based distros (that's a lot of them) that literally went backwards and switched to a less capable default shell. – jthill Aug 30 '23 at 10:54