So, I have a git repository that I worked on for years in private. Basically, I only used one branch (MASTER), and git was basically like a journal for me. Now, I want to make this source code public, but I want to remove files that I don't have permission to re-distribute. I'd also like to remove some comments that I don't want to make public.
These private comments and undistributable files are still important to me though.
Let's say my most recent commit is called A. I can make a public branch (PUBLIC), and delete everything that can't be public and make a new commit. We'll call that new commit B. Then I can clone that repository, through a number of options, erase the history, then publish the new repository and branch. Then I can work from now on in this public branch with my future collaborators.
This works for the public side of things, but I don't know how to effectively capture what has been deleted. Sure, I have my old branch in my original repository, but it's not really efficient to sort through that to find small comments and changes that I later forgot where deleted from the new public branch.
Is there a way to make a third branch (PRIVATE) that has a new commit, C, represents the difference between A and B?
For example:
Commit A (MASTER)
| | - File 1 (text file)
| | - File 2 (text file)
| | - File 3 (binary file)
| | - File 4 (text file)
| | - File 5 (text file)
| | - File 6 (text file)
| | - File 7 (text file)
| | - File 8 (text file)
| | - File 9 (text file)
| |
| Commit B (PUBLIC)
| - File 1 (text file)
| - File 2 (text file)
| - File 4 (text file)
| - File 7 (same except for line 2 removed)
| - File 8 (same except for line 4 edited)
| - File 9 (same except for a new line inserted between line 8 and line 9)
| - File 10 (new text file)
| - File 11 (new binary file)
|
Commit C (PRIVATE)
- File 3 (binary file)
- File 5 (text file)
- File 6 (text file)
- File 7 (only shows line 2)
- File 8 (only shows the original line 4)
Basically I am trying to take branch MASTER, fork it into a new branch PUBLIC, and then automatically fork branch MASTER into a new branch PRIVATE, whose latest commit contains nothing in the latest commit of PUBLIC. Basically, I want to automatically split, or divide branch MASTER, just be deleting things.
Update
One thing I have done is, after creating Commit B on the PUBLIC branch, switch to the PRIVATE branch, then run
rm -r *
git diff --binary --no-ext-diff B..A|git apply --reject
This will work for File 3
, File 5
, File 6
, but for File 7
and File 8
, it errors out with No such file or directory
.
If I instead run
rm -r *
touch "File 7"
touch "File 8"
git diff --unified=0 --binary --no-ext-diff B..A|git apply --reject
This will work for File 3
, File 5
, File 6
, and File 7
, but for File 8
it places some stuff (a changed line) in a file named File 8.rej
instead. This is getting closer to what I'm wanting, but using the touch
command here isn't practical since I won't know what files actually changed.
Update 2
I can use the following to automatically touch modified files:
git diff --diff-filter=M -z --name-only B..A| xargs -0 -IREPLACE touch REPLACE
I can also use:
find . -type f -name '*.rej' -print0 | xargs -0 rename -f 's/.rej$//'
if I want to overwrite the empty (touched files) with the .rej files and get rid of the .rej files. However, I'm not sure if a patch on a file can be partially applied, so maybe there will be some cases where there were some changes to the original file (the touched file may not always stay empty) and others go into the .rej file. So, this step may not be totally safe.
What I think I'm opting for at this point is
rm -r *
git diff --diff-filter=M -z --name-only B..A| xargs -0 -IREPLACE touch REPLACE
git diff --binary --no-ext-diff B..A|git apply --reject
Skipping the --unified=0
option puts both line deletions and line changes into .rej files (I get a File 7.rej
and File 8.rej
).
In this case, what I'm getting in the PRIVATE branch is, any files deleted in the PUBLIC branch are restored, any files with deletions or changes, I'm getting basically an individual diff for each file, in it's own file. I'm also opting to just leave the .rej files there, since as mentioned above, I don't know if it's safe to do so.
However, this process does not work right for files that are renamed in the PUBLIC branch. If a file is renamed in the PUBLIC branch, the original file just gets completely restored in the PRIVATE branch (as if it were deleted). If a file is modified and renamed, the entire file also gets completely restored in the PRIVATE branch, so you can't tell if anything has been changed or removed inside that file.