Retrofitting an upstream git repository for a fork

Question

I've started a new job and found that there's a messy version control problem and I'd appreciate any thoughts you might have on how to bring it under control:

Project U is a large nominally upstream project (millions of LOCs, tens of thousands of files).
Project S is a long-lived fork of one directory, D, in the upstream project U (no other code is used).
Both project U and project S use git for version control.
D has around 1,500 files with modifications to a fair number (our version runs on a non-traditional OS).
Historically, developers here manually incorporate diffs from U to S with three-way merge and keep a log of the version they updated on a spreadsheet.

This seems like a recipe for hours of tedious merging and minor disasters from mixed versions of files and potentially missed security updates. It's expected that there are some errors in the spreadsheet and a base version guessing tool that helps developers figure out the base version for files when trying to roll them forward.

What I'm wondering is if there's a path to cleaning this up? Could we retrofit having an upstream repository and merging off of changes that occur there instead?

I've found git-filter-repo might be an appropriate tool for creating an intermediate repo that is scoped to the directory D in U. However, I'm having a hard time seeing how to connect up the the upstream and start working off that instead. Is this possible? Perhaps we get all the files in our repo based off a common version in the upstream so we have upstream base version plus a diff and can then go from there, but I'm struggling to put the pieces together.

Thanks in advance

Since Git is based on *commits* (not files), filtering one repository down to just a directory is not going to help any: the new commits made by such a filter are unrelated to the original commits. (Remember that filter-branch or filter-repo, either one, works by taking some set of original commits and using those to build new-and-improved replacement commits.) You would in fact need to find or produce a common project, or use submodules, or something similar here and none of this will be easy. Probably the hardest part will be the corporate culture part. :-) — torek, Jun 13 '21 at 10:39
Thanks Torek. A common project is probably hard: there's no way upstream would take the changes here: they're not cleanly factored out, they just replace existing code; and I can't see upstream wanting to split up their project. Culture might be okay, the folks who instigated this are long gone. — Peter Orson, Jun 13 '21 at 13:05
Have you considered tricking git into relating U and S by using `git commit-tree`? If you merge both branches by say, creating a merge revision in U that brings in the tip of S, from that moment on bringing stuff over from S into U should be rather trivial.... the other way around? Well, you will need to try. — eftshift0, Jun 13 '21 at 15:23

Retrofitting an upstream git repository for a fork

0 Answers0