I've started a new job and found that there's a messy version control problem and I'd appreciate any thoughts you might have on how to bring it under control:
- Project U is a large nominally upstream project (millions of LOCs, tens of thousands of files).
- Project S is a long-lived fork of one directory, D, in the upstream project U (no other code is used).
- Both project U and project S use git for version control.
- D has around 1,500 files with modifications to a fair number (our version runs on a non-traditional OS).
- Historically, developers here manually incorporate diffs from U to S with three-way merge and keep a log of the version they updated on a spreadsheet.
This seems like a recipe for hours of tedious merging and minor disasters from mixed versions of files and potentially missed security updates. It's expected that there are some errors in the spreadsheet and a base version guessing tool that helps developers figure out the base version for files when trying to roll them forward.
What I'm wondering is if there's a path to cleaning this up? Could we retrofit having an upstream repository and merging off of changes that occur there instead?
I've found git-filter-repo might be an appropriate tool for creating an intermediate repo that is scoped to the directory D in U. However, I'm having a hard time seeing how to connect up the the upstream and start working off that instead. Is this possible? Perhaps we get all the files in our repo based off a common version in the upstream so we have upstream base version plus a diff and can then go from there, but I'm struggling to put the pieces together.
Thanks in advance