2

The situation is this: I have a bunch of files from ages ago (back when I wasn't using source control of any kind) which I'd love to put to modern-time Git repositories.

I found one tool for this - file-fast-export. It takes existing files and spits out a file that can be fed to git fast-import. Fine so far.

Only problem is that fast-import is a bit picky about input format and file-fast-export isn't a particularly polished program - it assumes files are being imported into a new repository.

So my use case is this: I have a git repository. I discovered a previously missing subdirectory. File modification times are there. I'd need to commit each of those files on top of the current repository history so that the commit timestamps correspond to the modification times. As I understood it, Git doesn't force commits to have chronologically consecutive timestamps, only that commits follow one another. I'm happy if I can just do git log xxxxx.txt and see file history.

I'm sure there are tools that can do this. How can I do this?

wwwwolf
  • 53
  • 5
  • You can specify a commit date manually when you create a new commit; see, for instance, http://stackoverflow.com/q/28090026/2541573 – jub0bs Feb 22 '15 at 15:35
  • There are probably not two files which have exactly the same timestamp. Does this mean, you want to have a history with exactly one file per commit? what is the advantage over having one commit containing all files? – michas Feb 22 '15 at 15:54
  • Yes, I'm aware of the fact that date can be manually specified in `git commit`, but I'd need to commit each of the files manually. I just wonder if there's already a well-established tool to do this in one fell swoop for a whole subdirectory. The advantage of doing this instead of a single commit is that this way, each of the file has the dates of their modifications recorded in standard way. I know one way would be to just make a single commit with `ls -lR` attached to the commit log, but that's not exactly elegant... – wwwwolf Feb 22 '15 at 16:00
  • @wwwwolf You could write a shell script for this. However, I don't understand why you would want to create one commit for each file. That goes against the semantics of Git commits, which are meant to represent units of work, whether it be on one or several files. – jub0bs Feb 22 '15 at 16:04
  • But in the case of the files in question, "units of work" can be interpreted as "groups of files that were modified roughly the same time", so I don't think it goes against Git semantics. You're right, though, scripting is always a solution - I just wonder if there are scripts that already do all this. – wwwwolf Feb 22 '15 at 16:17
  • @wwwwolf But, surely, those *"groups of files that were modified roughly the same time"* are interconnected moving parts of your program, and were not written in isolation from one another. In that case, it doesn't make sense to commit them one by one; commits are meant to be *logical* units of work. (SO tip: If you're addressing your comment to someone in particular, use the `@`, as I did here. Otherwise, the other person doesn't get notified of your comment.) – jub0bs Feb 22 '15 at 16:31

1 Answers1

0

Only problem is that fast-import is a bit picky about input format

Actually... With Git 2.28 (Q3 2020), some repositories in the wild have commits that record nonsense committer timezone (e.g. rails.git); "git fast-import" learned an option to pass these nonsense timestamps intact to allow recreating existing repositories as-is.

That could be helpful in your case.

See commit d42a2fb (30 May 2020) by Elijah Newren (newren).
(Merged by Junio C Hamano -- gitster -- in commit 5404183, 02 Jun 2020)

fast-import: add new --date-format=raw-permissive format

Signed-off-by: Elijah Newren

There are multiple repositories in the wild with random, invalid timezones.

Most notably is a commit from rails.git with a timezone of "+051800".

A few searches will find other repos with that same invalid timezone as well.

Further, Peff reports that GitHub relaxed their fsck checks in August 2011 to accept any timezone value, and there have been multiple reports to filter-repo about fast-import crashing while trying to import their existing repositories since they had timezone values such as "-[7349423]" and "-[43455309]".

The existing check on timezone values inside fast-import may prove useful for people who are crafting fast-import input by hand or with a new script.

For them, the check may help them avoid accidentally recording invalid dates.

(Note that this check is rather simplistic and there are still several forms of invalid dates that fast-import does not check for: dates in the future, timezone values with minutes that are not divisible by 15, and timezone values with minutes that are 60 or greater.)
While this simple check may have some value for those users, other users or tools will want to import existing repositories as-is.
Provide a --date-format=raw-permissive format that will not error out on these otherwise invalid timezones so that such existing repositories can be imported.

Community
  • 1
  • 1
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250