4

The following code prints something like °Ð½Ð´Ð¸Ñ-ÐÑпаниÑ

getDirectoryContents "path/to/directory/that/contains/files/with/nonASCII/names"
  >>= mapM_ putStrLn

Looks like it is a ghc bug and it is fixed already in repository. But what to do until everybody upgrade ghc?

The last time I encountered such the problem (it was few years ago, btw), I used utf8-string package to convert strings, but I don't remember how I did it, and ghc unicode support was changed visibly last years.

So, what is the best (or at least working) way to get directory contents with full unicode support?

ghc version 7.0.4 locale en_US.UTF-8

Yuras
  • 13,856
  • 1
  • 45
  • 58

2 Answers2

5

Here's a simple workaround using decodeString and encodeString from utf8-string.

import System.Directory
import qualified Codec.Binary.UTF8.String as UTF8

main = do
   getDirectoryContents "." >>= mapM_ (putStrLn . UTF8.decodeString)
   putStrLn "------------"
   readFile (UTF8.encodeString "brøken-file-nåme.txt") >>= putStrLn

Output:

.
..
brøken-file-nåme.txt
Broken.hs
------------
hello
hammar
  • 138,522
  • 17
  • 304
  • 385
  • @Yuras: As I understood it, `base` will to the UTF8 conversion itself, so you might want to use conditional compilation to remove the conversion if using the appropriate version of `base`. – hammar Jul 24 '11 at 13:20
3

I would recommend looking at system-filepath, which provides an abstract datatype for representing filepaths. I've used it extensively for some internal code and it works wonderfully.

Michael Snoyman
  • 31,100
  • 3
  • 48
  • 77