14

I want to make a Haskell script to read files in my /home folder. However there are many files named with Chinese characters, and Haskell and Ghci cannot manage it. It seems Haskell and Ghci aren't good at displaying UTF-8 characters.

Here is what I encountered:

Prelude> "让Haskell或者Ghci能正确显示汉字并且读取汉字命名的文档"

"\35753Haskell\25110\32773Ghci\33021\27491\30830\26174\31034\27721\23383\24182\19988\35835\21462\27721\23383\21629\21517\30340\25991\26723"
dda
  • 6,030
  • 2
  • 25
  • 34
TorosFanny
  • 1,702
  • 1
  • 16
  • 25
  • 8
    This doesn't really answer your question, but I'll comment that you won't have this problem with a program printing strings with `putStrLn` and friends. Also, [this SO question](http://stackoverflow.com/questions/5535512/how-to-hack-ghci-or-hugs-so-that-it-prints-unicode-chars-unescaped) might be of help to you. – gspr Dec 26 '12 at 11:16

2 Answers2

27
Prelude> putStrLn "\35753Haskell\25110\32773Ghci\33021\27491\30830\26174\31034\27721\23383\24182\19988\35835\21462\27721\23383\21629\21517\30340\25991\26723"
让Haskell或者Ghci能正确显示汉字并且读取汉字命名的文档

GHC handles unicode just fine. These are the things you should know about it:

It uses your system encoding for converting from byte to characters and back when reading from or writing to the console. Since it did the conversion from bytes to characters properly in your example, I'd say your system encoding is set properly.

The show function on String has a limited output character set. The show function is used by GHCI to print the result of evaluating an expression, and by the print function to convert the value passed in to a String representation.

The putStr and putStrLn functions are for actually writing a String to the console exactly as it was provided to them.

Carl
  • 26,500
  • 4
  • 65
  • 86
  • I tried "hGetLine h >>= hPutStr g" and the file corresponding with target g really gets the correct contents. But ghci cannot display Chinese characters normally. – TorosFanny Dec 26 '12 at 12:05
  • 9
    @user1926094: it's not so much "cannot" as "does not". It *chooses* to escape them, because the escaped version can't be screwed up by your terminal, or your font, or whatever else. – Ben Millwood Dec 26 '12 at 13:00
0

Thanks to Carl, i used putStrLn as a wrapper around my fuction:

ghci> let removeNonUppercase st = [c | c <- st, c `elem` ['А'..'Я']]
ghci> putStrLn (removeNonUppercase "Ха-ха-ха! А-ха-ха!")
ХА

Everything works fine!

Yury Kochubeev
  • 177
  • 1
  • 1
  • 12