4

I have a bunch of text files I want to use with grep. They are all from an external source and are UTF-16 encoded and begin with a byte order mark.

Unix tools like grep don't work on them for me. What work-around is there for this?

Steve McLeod
  • 51,737
  • 47
  • 128
  • 184
  • Out of curiosity: does it work if you set the `LANG` environmental variable to something like `en_GB.UTF-16` (or whatever your locale is)? –  Jan 29 '11 at 10:07
  • @Bavarious, I tried your suggestion but it didn't work – Steve McLeod Jan 29 '11 at 11:29

2 Answers2

8

Just use iconv(1) to change them to utf-8.

DigitalRoss
  • 143,651
  • 25
  • 248
  • 329
0

Mac OS X comes with an old version of BSD grep out of the box, which is limited and very slow. Even so, both BSD and GNU grep do not handle UTF-16 files. Other grep tools like ag, rg, and ugrep are designed to support Unicode and UTF files. Of these three, ugrep is closer to GNU grep, so there is no learning curve to use it as a compatible replacement for grep:

ugrep "PATTERN" FILE ...

If your files contain UTF byte order marks then there is no need to convert them to search with ugrep, ag, or rg.

To search files without byte order marks requires a flag, e.g. --encoding with ugrep:

ugrep --encoding=UTF-16 "PATTERN" FILE ...
Dr. Alex RE
  • 1,772
  • 1
  • 15
  • 23