1

Let's call my-dirty-repository an existing Git repository containing lots of scripts which are not related. It is a catchall repository which needs to be properly cleaned.

As a Minimal, Complete, and Verifiable example, let's say this repository only contains:

script1.sh
script2.sh

With various commits, which independently updated them, among several branches.

The aim is to create 2 100% independant Git repositories, with ONLY the history of kept files (references).

Let's call them my-clean-repository1 and my-clean-repository2, the first one having only history about script1, and the second having only history about script2.

I tried 3 ways to reach my needs, without success:

I'm pretty sure there is a way to perform it properly.

piet.t
  • 11,718
  • 21
  • 43
  • 52
Bsquare ℬℬ
  • 4,423
  • 11
  • 24
  • 44
  • If these would be separate directories then you can use `git subtree`. However I do not know if you can do so for single files. – Hauleth Nov 20 '18 at 15:52
  • 1
    Look into [git filter-branch](https://git-scm.com/docs/git-filter-branch). I know it can do this on subdirectories. There should be a way to do it for individual files, but it might be more complex. GitHub provides instructions for directories [here](https://help.github.com/articles/splitting-a-subfolder-out-into-a-new-repository/). – mkasberg Nov 20 '18 at 15:56
  • @Hauleth it would have been too easy, of course the most of scripts are all in the same root directory ;/ – Bsquare ℬℬ Nov 20 '18 at 16:03
  • @mkasberg Thanks for your answer, I'll try that. But are you sure the history will be updated consequently? – Bsquare ℬℬ Nov 20 '18 at 16:03
  • Git filter-branch is a tool that programmatically re-writes git history. It will, essentially, recreate the repository from scratch containing only the commits you tell it to create. Unfortunately, it's a complex command that can be hard to get right for more complex use cases. – mkasberg Nov 20 '18 at 16:07
  • If you have a relatively short commit history (say, less than 50 commits), `git rebase -i` might be easier to work with - it is also a tool to re-write history, and it's easy to use it to remove commits. – mkasberg Nov 20 '18 at 16:07
  • @mkasberg Yes it was one of the way I though but the fact is the history contains several commits on script1, then script2, then script1 ... So it is not possible to reach the need with a rebase – Bsquare ℬℬ Nov 20 '18 at 16:50
  • That's the perfect situation for an _interactive_ rebase. Which will let you pick which commits to include. It should be very straight forward, particularly if commits tend to touch script1 or script2 but not both in the same commit. – mkasberg Nov 20 '18 at 17:45
  • @mkasberg thank you for your comments, I posted a complete answer with a perfect solution to my needs to share with community. – Bsquare ℬℬ Nov 22 '18 at 14:37
  • Does anyone have an alternative? – Bsquare ℬℬ Nov 21 '19 at 09:37
  • I updated link to my [new repository](https://gitlab.com/bertrand-benoit/cloneToCleanGitRepositories) on GitLab. – Bsquare ℬℬ Mar 01 '20 at 12:37

1 Answers1

0

Edit: I created dedicated tool cloneToCleanGitRepositories to answer this need.

It is complete version of the old following one.


@mkasberg thank you for your advices about interactive rebase which is very interesting in some simple history situation.

I tried it, and it resolves my issue for some of the scripts for which I wanted a clean dedicated, independent, git repository.

Eventually, it was not enough for most of them, and I tried again another solution with Git filtering system.

Finally, I wrote this little script:

#!/bin/bash
##
## Author: Bertrand Benoit <mailto:contact@bertrand-benoit.net>
## Description: Create clean git repositories for each file in root of specified source Git repository, updating history consequently. 
## Version: 1.0

[ $# -lt 2 ] && echo -e "Usage: $0 <source repository> <dest root directory>" >&2 && exit 1

SOURCE_REPO="$1"
[ ! -d "$SOURCE_REPO" ] && echo -e "Specified source Git repository '$SOURCE_REPO' does not exist." >&2 && exit 1
DEST_ROOT_DIR="$2"
[ ! -d "$DEST_ROOT_DIR" ] && echo -e "Specified destination root directory '$DEST_ROOT_DIR' does not exist." >&2 && exit 1

sourceRepoName=$( basename "$SOURCE_REPO" )

# For each file in root of the source git repository.
for refToManage in $( find "$SOURCE_REPO" -maxdepth 1 -type f ); do
  echo -ne "Managing $refToManage ... "

  refFileName=$( basename "$refToManage" )
  newDestRepo="$DEST_ROOT_DIR/$refFileName"

  # Creates the repository if not existing.
  logFile="$newDestRepo/logFile.txt"
  echo -ne "creating new repository: $newDestRepo, Log file: $logFile ... "
  if [ ! -d "$newDestRepo" ]; then
    mkdir -p "$newDestRepo"
    cd "$newDestRepo"
    ! git clone -q "$SOURCE_REPO" && echo -e "Error while cloning source repository to $newDestRepo." >&2 && exit 2
  fi
  cd "$newDestRepo/$sourceRepoName"

  # Removes all other resources.
  FILTER='git ls-tree -r --name-only --full-tree "$GIT_COMMIT" | grep -v "'$refFileName'" | tr "\n" "\0" | xargs -0 git rm -f --cached -r --ignore-unmatch'
  ! git filter-branch -f --prune-empty --index-filter "$FILTER" -- --all >"$logFile" 2>&1 && echo -e "Error while cleaning new git repository." >&2 && exit 3

  # Cleans remote information to ensure there is no push to the source repository.
  ! git remote remove origin >>"$logFile" 2>&1 && echo -e "Error while removing remote." >&2 && exit 2

  echo "done"
done

Usage :

mkdir /tmp/cleanRepoDest
createCleanGitRepo.sh ~/_gitRepo/Scripts /tmp/cleanRepoDest

In destination directory, it will create a new clean git repository for EACH file in root directory of specified source Git repository. In each one, the history is clean and is only related to the kept script.

In addition it disconnects/removes the remote to ensure avoiding issue pushing back the changes to the source repository.

This way, it is easy to 'migrate' from a big dirty catchall Git Repository, to various clean ones :-)

Bsquare ℬℬ
  • 4,423
  • 11
  • 24
  • 44