0

I am converting a very old and huge CVS repository to Git using cvs2git via Cygwin. It works fine and I started testing the new repository. I found no bigger peculiarities. But I wonder how the timestamps of a commit/change set are determined.

So far I determined, that the timestamps between certain CVS revisions differ by 1 or 2 hours plus x, where x is a time from some seconds or minutes (most cases) up two 15 minutes. Many timestamps only differ by whole hours (x=0).

I guess this has to do something with the "timestamp error correction" I found to be a cvs2svn feature (http://www.mcs.anl.gov/~jacob/cvs2svn/features.html). Maybe it has something to do with time zones, also.

The results of my tests show, that all commits with only one file in the change set differ by whole hours. That supports my "time zone hypothesis". But it also leads me to the question how the timestamp of change sets with multiple files is determined.

I tried to go through the code and found out (with help from Google) that there is a "COMMIT_THRESHOLD" in the config.py of the cvs2svn_lib. It is used for fuzzing the file based commits in the CVS together, I guess. Although the code looks written well, my lack of technical understanding of CVS, SVN and Git revision storage makes it hard for me to understand.

Therefore, I would be grateful if someone could answer the following questions:

  • How does cvs2svn/cvs2git determine a commit timestamp of change sets with multiple files?
  • How does the "timestamp error correction" cvs2svn/cvs2git work? (For me the functional background is more important than the technical.)

Kind regards

Edit:

As someone considered this question as "too broad", I am afraid I did not make my point clear enough. So I would like to give a concrete (while fictional) example:

cvs2git found 3 file changes for one change set. They where committed on the same day (let's say on 30th February 2016). But their times differ:

  • File 1: 12:34:56
  • File 2: 12:35:38
  • File 3: 12:36:09

If it was only file 1, I would think, that cvs2git uses 2016-02-30T12:34:56 as timestamp for the Git commit. But which timestamp is chosen, when the commits for all 3 files belong to one change set?

Related to this, when my repository is converted the times seem to be adjusted by exactly 1 or 2 hours, too. This also happens when there is only one file in the change set. I guess it is some kind of time zone adjustment. So I would like to know, why the "timestamp error correction" changed my timestamps, to check whether I accept these changes or not. I did some statistics on the converted Git repository and the commit times seem ok to me in principle; but that is not enough for me.

nemo
  • 21
  • 4
  • I see someone has voted to close this as "too broad" (which is probably true). I don't know the details of the innards of cvs2(git/svn) and can't really answer, but it's worth mentioning that CVS is a file-oriented centralized-server system while SVN and Git are commit-oriented (and centralized and distributed respectively). The file orientation means that doing a good job of conversion requires correlating individual file revisions ("cvs ci" instances) into one single commit, and that would require some timestamp flexiblity. – torek Oct 07 '16 at 07:57
  • Thank you for commenting. I added an example to make clear what I mean. By the way: I know about the main differences between CVS and Git, especially about the per file and per commit principles. This is why I would like to switch to Git. Though, I do not know how the functions are implemented in CVS or Git or cvs2git. For me this is the difference between functional and technical understanding. Maybe it looked a little bit like I do not understand how CVS and Git work, because English it not my first language. – nemo Oct 07 '16 at 08:44
  • I _think_ CVS stores timestamps in UTC. When you say the git timestamp is 1-2 hour off, is that also in UTC, so the error is absolute, or is git displaying time in a timezone? (Your question seems well worded and researched so I don't want to insult/annoy you here, but I just wanted to check.) As for which of the 3 timestamps was picked when a git commit is fabricated from multiple cvs commits: does it matter? I know, I know, I'm a precise engineer too and I want to know how it works, but if it doesn't _really_ affect anything maybe it's easiest to just know that what it's doing is reasonab – Mort Oct 08 '16 at 02:44
  • Thanks for your comment, Mort: I do not feel insulted in any way. :-) After your comment and mhaggers answer I checked for the time zones; and they seem to be the cause. (See my comment to mhaggers answer.) On the small differences: I just wanted to make sure that the conversion worked correctly; especially because I am converting a CVSNT repository. – nemo Oct 19 '16 at 07:15

1 Answers1

2

You ask two questions:

  1. How are timestamps generated for commits touching multiple files?

    For commits that modify files, cvs2svn/cvs2git takes the newest timestamp from among the file-level commits that comprise the commit. However, if that timestamp is earlier than the timestamp of the previous commit or more than one day after the time of conversion, it instead chooses a timestamp one second after that of the previous commit.

    For commits that involve branching or tagging (for which CVS doesn't record timestamps at all), the timestamp is set to be one second after the timestamp of the previous commit.

  2. Why are timestamps sometimes off by an integral number of hours?

    CVS records timestamps in UTC without recording a timezone, and cvs2svn/cvs2git uses those timestamps as-is without trying to guess a timezone. So the timestamps should be correct, but are expressed in UTC.

    git log has a --date option that can be used to ask that dates be displayed in the local timezone.

The cvs2svn project file doc/design-notes.txt documents the algorithms used by cvs2svn/cvs2git in quite some detail.

mhagger
  • 2,687
  • 18
  • 13
  • Thank you for your answer; and 1up for the the to the documentation I did not find. For the later generations: Look for the chapter "TopologicalSortPass". – nemo Oct 19 '16 at 07:05
  • I just found out, that I can not upvote right now; sorry. By the way: The whole hour(s) time differences are caused by the time zones: +1 h for the regular time and +2 hours for daylight saving time. When I use `git log` the correct (UTC) timestamp is shown. – nemo Oct 19 '16 at 07:10