1

I've been tasked with migrating our entire PVCS repository to git including all of the history. The only way that I've come up with to do this is to run a PVCS VLOG command to extract the revision history (for all files) to a file and then parse that file (using a C# program) to get the list of revisions for each file.

Then revision-by-revision I GET the given revision of the file from PVCS, ADD the file to GIT and do a COMMIT. So for each of the ~14,000 files I will have a commit for each revision of the file (and each file could have from 1-100+ revisions). Am I crazy in thinking this will work? Are there just going to be too many commits making the repo too large and unwieldy?

halfer
  • 19,824
  • 17
  • 99
  • 186
Ben_G
  • 770
  • 2
  • 8
  • 30
  • 1
    Regardless of whether it's too many or not, the resulting history will be essentially useless. – Oliver Charlesworth Feb 26 '15 at 23:11
  • Can you do it the other way around so you identify the file sets of a given commit in PVCS and then recreate that with git "establish commit 1, change files to make it identical to commit 2, actually git commit (with a fake date), and repeat for all files". The various "migrate subversion to git" scripts may be helpful here. – Thorbjørn Ravn Andersen Feb 27 '15 at 01:11
  • I agree that "a single file in each commit" will make the repository worthless except for generating a proper history for a single file. – Thorbjørn Ravn Andersen Feb 27 '15 at 01:13
  • `git rev-list --count --all` on my linux repo says 549839, and linux isn't so much large as clearly no longer small as history size goes. – jthill Feb 27 '15 at 14:41
  • The problem is that PVCS doesn't commit file sets - it commits individual files. So at any point in time only one file is checked in. There's no sense of "give me a snapshot of the entire repo at one point in time - each file can be different at each point in time. Hope that makes sense. – Ben_G Feb 27 '15 at 17:33
  • You need to figure out the change set yourself. Probably easiest to group by timestamp by date or timestamp by hour or a combination of timestamp and author. – slebetman May 30 '23 at 23:14

1 Answers1

1

Disclaimer: I am not familiar with PVCS in particular.

However, I have dealt with a similar issue converting CVS to Git. There is a git command cvsimport, which groups file commits based on time, committer, and message. If there are tools that can convert PVCS to CVS or svn (there is an svn import for Git as well) then just convert in two steps.

Otherwise, I would suggest modifying your program as follows:

  • Sort all commits (across files) by date
  • For each commit
    • If committer, date, or message is different than previous commit, then commit
    • Get file content of current commit

Obviously, the dates should not have to exactly match. Make some determination regarding what is considered the same commit. Also, you may want to allow similar commit messages to be considered the same commit if, for instance, they have the same bug-tracking number.

Consider using git fast-import which bypasses the index for much faster processing time.

Joseph K. Strauss
  • 4,683
  • 1
  • 23
  • 40