1

I have used writeBS writeText from Snap and renderTemplate from heist but none of them seems to support unicode.

site :: Snap ()
site = do
    ifTop (writeBS "你好世界") <|>
    route [("test", testSnap)]

testSnap :: Snap ()
testSnap = do
     fromJust $ C.renderTemplate hs "test"

-- test.tpl

你好世界

I expected it to output "你好世界" for the / or /test route, but in fact its output is just some messy code.

Lynton
  • 257
  • 5
  • 12
  • 1
    What do you mean by "does not support unicode"? Can you show a small example of what you've tried and tell us what you expect to happen and what actually happens? – bennofs Sep 21 '13 at 11:00
  • Thank you , I add some code here, hope it would be more expressive. – Lynton Sep 21 '13 at 12:09

1 Answers1

1

The problem here is not with writeBS or writeText. It's with the conversion used by the OverloadedStrings extension. It is also important to understand the distinction between ByteString and Text. ByteString is for raw bytes. There is no concept of characters or an encoding. That is where Text comes in. The Data.Text.Encoding module has a bunch of functions for converting between Text and ByteString using different encodings. For me, both of the following generate the same output:

writeBS $ encodeUtf8 "你好世界"
writeText "你好世界"

The reason your code didn't work is because your string literal is being converted to ByteString by the OverloadedStrings extension, and it is not giving you the behavior you want. The solution is to treat it as the proper type...Text.

On the Heist side of things, the following works fine for me:

route [("test", cRender "test")]

In fact, this one renders correctly in my browser, while the previous two don't. The difference is that cRender sets an appropriate content-type. I found it enlightening to observe the differences using the following snippet.

site = route [ ("/test1", writeBS "你好世界")
             , ("/test2", writeBS $ encodeUtf8 "你好世界")
             , ("/test3", writeText "你好世界")
             , ("/test4", modifyResponse (setContentType "text/html;charset=utf-8") >> writeText "你好世界")
             , ("/testHeist", cRender "test")
             ]

In my browser test4 and testHeist work correctly. Tests 2 and 3 give you the correct behavior but might not be rendered properly by browsers because of the lack of content-type.

mightybyte
  • 7,282
  • 3
  • 23
  • 39
  • Thank you for your detailed answer, it seems Text is the right type for unicode but I am confused when I use splice, I imported Text as T and I write `C.yieldRuntimeText $ do return T.pack "你好世界" ` I thought I would get "你好世界" when I use the splice but actually I got some messy code from the browser. – Lynton Sep 24 '13 at 15:16
  • Don't use T.pack. It accepts an argument of type String, which means that OverloadedStrings converts your literal to String first, then pack converts it to Text, which will screw things up. Just do `C.yieldRuntimeText $ return "你好世界"`. – mightybyte Sep 25 '13 at 05:47
  • But I am using splice to generate some FilePath, and FilePath is actually string, something like: `s = "你好世界" :: FilePath`, how should I show it with splice? – Lynton Sep 25 '13 at 06:42
  • Hmmm, this could be a problem. String may screw things up for you because of its handling of encoding. I would say your best bet would be to construct everything as Text first. Then, if you need a FilePath, convert to it using unpack. If that doesn't work, then you might need to talk to the GHC people about how unicode is handled in String. – mightybyte Sep 25 '13 at 13:07