0

I've started a new job and found that there's a messy version control problem and I'd appreciate any thoughts you might have on how to bring it under control:

  • Project U is a large nominally upstream project (millions of LOCs, tens of thousands of files).
  • Project S is a long-lived fork of one directory, D, in the upstream project U (no other code is used).
  • Both project U and project S use git for version control.
  • D has around 1,500 files with modifications to a fair number (our version runs on a non-traditional OS).
  • Historically, developers here manually incorporate diffs from U to S with three-way merge and keep a log of the version they updated on a spreadsheet.

This seems like a recipe for hours of tedious merging and minor disasters from mixed versions of files and potentially missed security updates. It's expected that there are some errors in the spreadsheet and a base version guessing tool that helps developers figure out the base version for files when trying to roll them forward.

What I'm wondering is if there's a path to cleaning this up? Could we retrofit having an upstream repository and merging off of changes that occur there instead?

I've found git-filter-repo might be an appropriate tool for creating an intermediate repo that is scoped to the directory D in U. However, I'm having a hard time seeing how to connect up the the upstream and start working off that instead. Is this possible? Perhaps we get all the files in our repo based off a common version in the upstream so we have upstream base version plus a diff and can then go from there, but I'm struggling to put the pieces together.

Thanks in advance

  • 1
    Since Git is based on *commits* (not files), filtering one repository down to just a directory is not going to help any: the new commits made by such a filter are unrelated to the original commits. (Remember that filter-branch or filter-repo, either one, works by taking some set of original commits and using those to build new-and-improved replacement commits.) You would in fact need to find or produce a common project, or use submodules, or something similar here and none of this will be easy. Probably the hardest part will be the corporate culture part. :-) – torek Jun 13 '21 at 10:39
  • Thanks Torek. A common project is probably hard: there's no way upstream would take the changes here: they're not cleanly factored out, they just replace existing code; and I can't see upstream wanting to split up their project. Culture might be okay, the folks who instigated this are long gone. – Peter Orson Jun 13 '21 at 13:05
  • Have you considered tricking git into relating U and S by using `git commit-tree`? If you merge both branches by say, creating a merge revision in U that brings in the tip of S, from that moment on bringing stuff over from S into U should be rather trivial.... the other way around? Well, you will need to try. – eftshift0 Jun 13 '21 at 15:23
  • Thanks @eftshift0, I'll look into it. – Peter Orson Jun 13 '21 at 18:30

0 Answers0