2

I try a migration from a Mercurial repository to Git on Windows 11 in the following way in Git Bash:

MINGW64$ ls
hg-repo/ git-repo/
MINGW64$ cd git-repo
MINGW64$ git init
MINGW64$ ~/fast-export/hg-fast-export.sh -r ../hg-repo/ --force -A ../hg-repo/authors.txt -M main

The migration succeeds and the following is needed

MINGW64$ git checkout main

which should result in a repository with no changes. But instead I get something as the following:

MINGW64$ git status
On branch main
Changes not staged for commit:
(use "git add/rm <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
    deleted:    Folder1/grünes-Ding.png
Untracked files:
(use "git add <file>..." to include in what will be committed)
    Änderungen/
    Folder1/grünes-Ding.png

So it looks like "Folder1/grünes-Ding.png" was deleted and then added again. If I try to restore the folder I get the following.

MINGW64$ git restore Folder1/grünes-Ding.png
error: pathspec 'Folder1/grünes-Ding.png' did not match any file(s) known to git

I think in this case Git does not understand "Folder1/grünes-Ding.png" because ü is represented in another way in Git as I see it in git-bash. "Änderungen/" should be also in the repository. Because if I delete it in the working directory, it appears with all its files as "deleted" changes. If I then try to restore these files I get the same error type. The files in this folder does not contain umlauts.

My question is: How can I tell Git to handle folders and files with Umlauts?

The only thing I found so far regarding umlauts was showing them correctly in logs or commit messages. But this is not the problem here.

My config of Git looks like this:

MINGW64$ git config -l
diff.astextplain.textconv=astextplain
http.sslbackend=openssl
http.sslcainfo=C:/Program Files/Git/mingw64/ssl/certs/ca-bundle.crt
core.autocrlf=input
core.fscache=true
core.symlinks=false
pull.rebase=false
init.defaultbranch=main
difftool.sourcetree.cmd=''
mergetool.sourcetree.cmd=''
mergetool.sourcetree.trustexitcode=true
core.repositoryformatversion=0
core.filemode=false
core.bare=false
core.logallrefupdates=true
core.symlinks=false
core.ignorecase=true
core.quotepath=false
core.fsmonitor=true
i18n.logoutputencoding=UTF-8
MINGW64$ locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_ALL=
marli
  • 529
  • 10
  • 19
  • I doubt the umlaut is the fault. Git operates on _files_, not _directories_. Trying to restore a directory name fails with "did not match any file(s) known to git". Can you restore a single file within "Änderungen"? – Friedrich Jan 13 '23 at 15:22
  • There is no file with umlauts in this folder. If I delete the folder, all files from this folder are staged as deleted. If I then try to restore one of this "deleted" files I get again this error. – marli Jan 13 '23 at 15:34
  • I corrected the question because I saw that the folder was only shown as untracked but files appear as untracked and changed. – marli Jan 13 '23 at 15:47
  • Have you tried the other tools? Namely _Git CMD_ or regular _cmd_ with git added to `PATH`? You might even clone to a linux machine and try there. – Friedrich Jan 13 '23 at 20:19
  • Yes, in CMD it is the same problem. I did not checked it in Linux or WSL yet. I can give it a try on Monday. Thanks for the remark. – marli Jan 14 '23 at 23:47
  • For what it's worth: on my linux machine I can add/rm/etc. files in "Änderungen" as you'd expect. Only `git status` shows it as "\303\204nderungen" where "\303" seems to be the umlaut modifier. – Friedrich Jan 15 '23 at 13:23
  • The thing is that "Ä" can be encoded in different ways. So the question is how I can get the information how git represents "Ä" in its guts and how can I tell git that I want this file in this representation. Unfortunate, I need to do this on Windows. The \303 can be removed with the core.quotepath=false configuration. But that is not the issue here. – marli Jan 15 '23 at 20:04
  • 1. You now have correct settings in Git, namely for `core.quotepath=false`. 2. Try your test-case with pure Git-repo data for reproduction 3. If pure git-repo doesn't have this issue (because it **must not**) - don't use ugly fast-export 3. `core.quotepath` **IS** a issue *here* – Lazy Badger Jan 16 '23 at 10:59
  • What do you suggest to use instead of fast-export? – marli Jan 16 '23 at 11:03

1 Answers1

3

I played a little bit around with the options of hg-fast-export and found a solution, eventually.

hg-fast-export has two options handling the encoding: -e and --fe. -e defines the encoding of the commit messages and author names etc. in Mercurial to convert it to UTF-8 and --fe defines the encoding of the filenames.

I tried different encodings for the filenames and found that latin1 worked for me. But first, I made the mistake to use -fe instead of --fe. But -fe results in -f and -e and not --fe. So be aware of this! If you use -e, also the option --fe is automatically set to the value of -e which then results in wrong encoding of commit messages.

Finally, the migration works like this

MINGW64$ ls
hg-repo/ git-repo/
MINGW64$ cd git-repo
MINGW64$ git init
MINGW64$ ~/fast-export/hg-fast-export.sh -r ../hg-repo/ --force -A ../hg-repo/authors.txt -M main --fe latin1
marli
  • 529
  • 10
  • 19