-1

OS: MacOS Ventura 13.4.1
Terminal: zsh

I am trying to list all files/folders in a directory that do include german letters ä ü ö. To test my regular expressions I created a test folder containing the following files:
./ü
./aüa

I tried a very simple find command:
find -E . -regex '.*ü'
Output: ./ü
Which is fine and correct.

Unfortunately If I try to find the ./aüa file using this regex:
find -E . -regex '.*aüa'
The output is empty, even though a file with the name ./aüa exists.

I tried to replace the "ü" in the regex with the unicode \u00FC like this:
find -E . -regex '.*a\u00FCa'
find -E . -regex '.*a\u{00FC}a'
find -E . -regex '.*a\x{00FC}a'
But nothing seems to work.

I did the same exercise with grep:
find . | grep -E ".*aüa" Same result (not working)

Additional Info: The german special letter ß does not seem to have this issue:
find -E . -regex '.*aßa'
If a file called ./aßa is present the above command prints it correctly to the terminal

Does anyone have a hint how to solve this issue? It drives me nuts

Patrick Dorn
  • 756
  • 8
  • 13
  • Could the regex character be a different Unicode for ü? What happens if you copy-paste from the file name to the regex? – Bohemian Aug 27 '23 at 00:46
  • I'm not sure what `-E` does for the find command other than give me an error, but after removing that it seems to work as expected. https://gist.github.com/CAustin582/98b4fc9ab0314d2442388407b4c3e667 – CAustin Aug 27 '23 at 01:20
  • What kind of volume are the files stored on? On the old Mac OS Extended (aka HFS+) volume format, accented characters would be split into a plain character and a combining accent (see my answer [here](https://apple.stackexchange.com/questions/10476)), which causes confusion with tools that don't do Unicode character equivalence classes right. Try `find -E . -regex $'.*au\u0308a'` (the `$'...'` tells zsh to convert escape sequences, and `u\u0308` is a plain "u" and a combining diaeresis). – Gordon Davisson Aug 27 '23 at 02:53
  • @CAustin `-E` is an option that the BSD version of `find` supports, which switches to POSIX "extended" regular expression syntax. You may be testing with the GNU version, where the rough equivalent would be to add `-regextype posix-extended` as part of the search expression. Neither option is standard. – Gordon Davisson Aug 27 '23 at 02:57

0 Answers0