3

Suppose I have two files in my git repository with different encodings: UTF-8 and CP866. I have console and etc configured with UTF-8 settings.

I want output of git commands like git diff or git show behave properly and don't show something like that:

diff --git a/myfile.tex b/myfile.tex
index 01ad4f3..b1fd24c 100644

--- a/myfile.tex
+++ b/myfile.tex
@@ -220,9 +220,9 @@ centertags]%
-<A3><A4><A5> $f_i \in k[x_1, \ldots , x_n]$, <A8><AC><A5><A5><E2> <E0><A5>襭<A8><A5> $(a_1, \dots, a_n)$. <92><AE><A3><A4><A0> <AF><AE><AB><A8><AD><AE><AC><A8><A0><AB>쭠<EF> <E1><A8><E1>⥬<A0> $\{ R(f_1,f_i) = 0 \}$ <A4><AB><EF> $i = 2, \dots, n$, <A3><A4><A5> $f_i$ <E0><A0><E1>ᬠ<E2>ਢ<A0><A5><E2><E1><EF> <AA><A0><AA> <AF><AE><AB><A8><AD><AE><AC> <AE><E2> $x_n$ <AD><A0><A4> <AA><AE><AB><EC>殬 $k[x_1, \ldots , x_{n-1}]$, 
<E1><AE><E1>⮨<E2> <A8><A7> $n-1$ <E3>ࠢ<AD><A5><AD><A8><A9> <AE><E2> <AF><A5>६<A5><AD><AD><EB><E5> $x_1, \dots x_{n-1}$, <A8> <A8><AC><A5><A5><E2> <E0><A5>襭<A8><A5> $(a_1, \dots, a_{n-1})$.
+<A

There is an option to set encoding conversion for all files:

git config --local core.pager "iconv -f cp866 -t utf-8 | less"
git config --local i18n.commitEncoding utf8
git config --local i18n.logoutputencoding cp866

But my goal is to somehow setup encoding conversion per file. I want my UTF-8 file and CP866 file to handle properly.

Is there solution?

petRUShka
  • 9,812
  • 12
  • 61
  • 95
  • `git diff` can be customized to used a custom diff tool by file extension, using the `.gitattributes` file and `diff` config option. (See, e.g., https://stackoverflow.com/questions/22190208/version-control-word-docx-files-with-docx2txt-with-git-on-mac-os-x) Maybe you could solve half your problem by writing a custom diff script that calls `iconv` before doing the diff, assuming you can tell the encoding by the extension? I don't know how you would do the same for `git show`, though. – joanis Aug 07 '19 at 22:45

1 Answers1

2

If you have a reasonably recent Git, you can have Git store all files in the repository as UTF-8 and simply check some of them out in a different encoding. Git will then show the diff as expected, but your working tree will have the properly encoded file.

You can do this by creating a .gitattributes file in the root of your repository that looks like this:

myfile.tex working-tree-encoding=CP866

(You may prefer to use IBM866, since that's the standard name and it may be more widely supported.) If you want the file to be in CP866 on your system only and let others have the UTF-8 version, then you can put this entry in .git/info/attributes instead of checking it into the repo.

You can also specify (almost) any pattern in a gitattributes file that you can specify in a gitignore file, so you can use wildcards, for example.

Once you've added the .gitattributes file, you should run git add --renormalize . to ensure that all the files are using the proper encoding, and then commit all of the changes.

A set of example steps for a new repository:

git init
printf 'a\xffb\n' >myfile.tex
git add myfile.tex
git commit -m 'Add CP866 file'
# You are here.
echo 'myfile.tex working-tree-encoding=CP866' >.gitattributes
git add --renormalize .
git commit -m 'Store files as UTF-8'
bk2204
  • 64,793
  • 6
  • 84
  • 100
  • I've placed `.gitattributes` with `myfile.tex working-tree-encoding=CP866` to the root of repository nearby `.gitignore`. But nothing have changed :( Should I do something additionally: commit, add new file etc? – petRUShka Aug 08 '19 at 06:18
  • Yes, you should run `git add --renormalize .` and commit all the changes, including the `.gitattributes` file. I've updated the answer to refelcet this. – bk2204 Aug 08 '19 at 11:13
  • Still doesn't work :( May be some more steps? `git add --renormalize . --verbose`: add '.gitattributes' add '.gitignore' add '.gitmodules' add 'myfile.tex – petRUShka Aug 08 '19 at 11:45
  • 1
    I've updated with example steps for a new repository. – bk2204 Aug 08 '19 at 23:27