1

When I run this code, I get a decode error from Data.Text. What am I doing wrong?

import Data.Text                    (Text, pack, unpack)
import Data.Text.Encoding           (decodeUtf8)
import Data.ByteString              (ByteString)
import System.Entropy

randBS :: IO ByteString 
randBS = do
    randBytes <- getEntropy 2048  
    return randBytes

main :: IO ()
main = do
    r <- randBS
    putStrLn $ unpack $ decodeUtf8 r 

Runtime Error:

Cannot decode byte '\xc4': Data.Text.Internal.Encoding.Fusion.streamUtf8:
Invalid UTF-8 stream

I would like to generate some random bytes that will be used as an auth token.

I am on Mac OS X (Yosemite) and GHC Version 7.10.1

Ecognium
  • 2,046
  • 1
  • 19
  • 35

1 Answers1

4

randBS returns random bytes not utf-8 encoded data! What you have is not a representation of Text so it doesn't matter which function you use you will encounter some decoding error, and so you'll have to use something like decodeUtf8With and use an error handler to replace invalid bytes with their literal counterpart.

Something like:

import Data.Text                    (Text, pack, unpack)
import Data.Text.Encoding           (decodeUtf8With)
import Data.ByteString              (ByteString)
import Data.Char                    (chr)
import Control.Applicative          ((<$>))
import System.Entropy

handler _ x = chr <$> fromIntegral <$> x

randBS :: IO ByteString 
randBS = do
    randBytes <- getEntropy 2048  
    return randBytes

main :: IO ()
main = do
    r <- randBS
    putStrLn $ unpack $ decodeUtf8With handler r 

Not tested, in this moment I don't have GHC installed :s


Probably even better is to simply use hexadecimal encoding instead of utf-8 + error handler. You can do so with the base16-bytestring library. So you'd first use the encode :: ByteString -> ByteString to obtain a representation with only ASCII values:

import Data.Text                    (Text, pack, unpack)
import Data.ByteString              (ByteString)
import Data.ByteString.Encoding     (decodeUtf8)
import Data.ByteString.Base16       (encode)
import System.Entropy

--- ... randBS as before

main = do
    r <- randBS
    putStrLn $ unpack $ decodeUtf8 $ encode r 
Bakuriu
  • 98,325
  • 22
  • 197
  • 231
  • 1
    I'd suggest using something like [base16-bytestring](https://hackage.haskell.org/package/base16-bytestring). The output will be longer but always the same length (twice as long as the input ByteString) and the output will be hex instead of gibberish (which is more suitable for an auth token). – cchalmers Jun 28 '15 at 11:05
  • @cchalmers You are right. It really depends on the output the OP wants, which he didn't really explain. – Bakuriu Jun 28 '15 at 12:32
  • @Bakuriu thanks! I would like to generate a session token that is secure. This token will be stored in the DB and shared wit the client. Encoding will probably give me what I want. – Ecognium Jun 28 '15 at 19:29
  • `\n -> T.take n . T.decodeUtf8 . B.encode <$> (getEntropy . uncurry (+) $ divMod n 2)` uses the hex technique above but still provides length `n` Text. – Wilfred Jan 14 '19 at 18:05