0

I have a project where the files are 8-bit encoded (Win-1251). Can you please tell me if there is a way using git grep to find a phrase composed of characters from the top of the ASCII table (i.e. with codes from 0x80 to 0xFF)?

I work under Windows. I use the console to work with git, and it seems that the text that I pass to search in git grep (for example, git grep "привет") is perceived by this utility as a sequence of utf-8 characters, i.e. git grep is actually trying to find the sequence of bytes "\xD0\xBF\xD1\x80\xD0\xB8\xD0\xB2\xD0\xB5\xD1\x82".

I also tried to execute this command for searching: git grep "\xEF\xF0\xE8\xE2\xE5\xF2" (where byte sequence in quotes is ASCII codes of "привет" word in Win-1251), but it turned out that grep does not accept escape sequences.

Dmitro25
  • 143
  • 7

2 Answers2

3

Try using the -P flag : Perl regexp should understand escape sequences


You could write your search patterns in a file, you can then tell git grep to read the search patterns from this file : git grep -f patterns.txt ...

The bonus of a file is that you can more easily control the encoding of its content.

You can also use this feature to build a script, that would turn a UTF8 string and encode it as Win-1251 before feeding it to git grep :

pattern=$1
shift
echo $pattern | iconv -t WINDOWS-1251 > /tmp/rusgrep-pattern
git grep -f /tmp/rusgrep-pattern "$@"
LeGEC
  • 46,477
  • 5
  • 57
  • 104
  • Thank you! This options works. This solution can already be used. But it may be possible to somehow specify a string of characters to search for, so that each time you do not translate the string into a sequence of bytes? – Dmitro25 Jul 26 '20 at 13:42
  • Thank you again. The solution by creating a separate file "pattern.txt" is quite suitable. I don't have to do this kind of search very often (of course, more often I have to search for words consisting of characters from the first half of the ASCII table), so it's not too difficult to create a file (but it would be quite difficult to translate word or several words into a escape sequence of hex codes). – Dmitro25 Jul 26 '20 at 14:52
1

Inspired by this gist and @LeGEC 's answer, you can do something like this -

git grep -P "$(iconv -f utf-8 <(echo -n 'привет') -t 'Windows-1251' | od -tx1 | sed -e 's/^[0-9]* //' -e '$d' -e 's/^/ /' -e 's/ /\\x/g')"

You can put this in a bash function

function gitbingrep {
    git grep -P "$(iconv -f utf-8 <(echo -n "$1") -t 'Windows-1251' | od -tx1 | sed -e 's/^[0-9]* //' -e '$d' -e 's/^/ /' -e 's/ /\\x/g')"
}

And now you can simply run gitbingrep привет

Omer Tuchfeld
  • 2,886
  • 1
  • 17
  • 24
  • Your expression didn't work in my case. Maybe it's not for the Windows console? – Dmitro25 Jul 26 '20 at 14:58
  • Yeah I doubt it'll work on a Windows console... Try Git Bash maybe? It works on Linux in any case. You can use WSL if you really want it to work on a Windows machine – Omer Tuchfeld Jul 26 '20 at 15:02
  • I tried it in "git bash" console too, but it still doesn't work. – Dmitro25 Jul 26 '20 at 15:16
  • So use WSL, or try to adjust it to work in the Git Bash console, shouldn't be too bad. Don't have a Windows machine to test on so can't help – Omer Tuchfeld Jul 26 '20 at 15:23