I have a git repository containing files which have some sensitive data possibly hardcoded, or formally hardcoded and now residing at some points in the git history.
In the interest of making the project publicly available so programers with similar interests can benefit form it and contribute changes back, I want to fork it an sanitize the offending files.
The procedure I considered was as follows:
- Shallow/Shared clone the repo locally to a new local location, this folder will become the public variant. Subsequent steps are in the new repo.
- Branch the master into a branch
public-master
- Remove all other branch refs.
- Sanitize
public-master
- Squash
public-master
git reflog expire --expire-unreachable=now --all && git gc --prune=all --agressive
remove all unreachable refs, which is now any obj not in the public branchgit push
add the public master back upstream into the private repository.- Set origin remote to public repo url, branch onto
master
. Push to origin.
Is this sufficient to sanitize my repo, or would it be possible to recover sensitive data after this. Is there a more sensible and common way to resolve this problem? Are any of the steps extranious?
For example can I do this all in one repository, or does the nature of git-packs mean I might still push an obj
that contains sensitive information?