Best way to convert between [Char] and [Word8]?

Question

I'm new to Haskell and I'm trying to use a pure SHA1 implementation in my app (Data.Digest.Pure.SHA) with a JSON library (AttoJSON).

AttoJSON uses Data.ByteString.Char8 bytestrings, SHA uses Data.ByteString.Lazy bytestrings, and some of my string literals in my app are [Char].

Haskell Prime's wiki page on Char types seems to indicate this is something still being worked out in the Haskell language/Prelude.

And this blogpost on unicode support lists a few libraries but its a couple years old.

What is the current best way to convert between these types, and what are some of the tradeoffs?

Thanks!

http://hackage.haskell.org/packages/archive/utf8-string/0.3.7/doc/html/Data-ByteString-Lazy-UTF8.html — singpolyma, Mar 12 '13 at 13:19
Note that a `Char` *cannot* safely be converted to `Word8` because `Char` can store many more values than `Word8`. — singpolyma, Mar 12 '13 at 13:20

score 6 · Answer 1 · answered Feb 09 '14 at 10:27

Here's what I have, without using ByteString's internal functions.

import Data.ByteString as S (ByteString, unpack)
import Data.ByteString.Char8 as C8 (pack)
import Data.Char (chr)

strToBS :: String -> S.ByteString
strToBS = C8.pack

bsToStr :: S.ByteString -> String
bsToStr = map (chr . fromEnum) . S.unpack

S.unpack on a ByteString gives us [Word8], we apply (chr . fromEnum) which converts any Enum type to a character. By composing all of them together we'll the function we want!

score 4 · Answer 2 · answered Jan 15 '11 at 23:09

For conversion between Char8 and Word8 you should be able to use toEnum/fromEnum conversions, as they represent the same data.

For Char and Strings you might be able to get away with Data.ByteString.Char8.pack/unpack or some sort of combination of map, toEnum and fromEnum, but that throws out data if you're using anything other than ASCII.

For strings which could contain more than just ASCII a popular choice is UTF8 encoding. I like the utf8-string package for this:

http://hackage.haskell.org/packages/archive/utf8-string/0.3.6/doc/html/Codec-Binary-UTF8-String.html

score 2 · Answer 3 · answered Jan 15 '11 at 22:26

Char8 and normal bytestrings are the same thing, just with different interfaces depending on which module you import. Mainly you want to convert between strict and lazy bytestrings, for which you use toChunks and fromChunks.

To put chars into bytestrings, use pack.

Also note that if your chars include codepoints which multibyte representations in UTF-8, then there will be problems.

score 1 · Answer 4 · answered Feb 10 '14 at 11:13

Note : This answers the question in a very specific case (calling functions on hard-coded strings).

This may seem a minor problem because conversion functions exist as detailed in previous answers. But I wanted a method to reduce administrative code, i.e. the code that you have to write just to get functions working together.

The solution to reducing type-handling code for strings is to use the OverloadedStrings pragma and import the relevant module(s)

{-# LANGUAGE OverloadedStrings #-}
module Dummy where
import  Data.ByteString.Lazy.Char8 (ByteString, append)

bslHandling :: ByteString -> ByteString
bslHandling = (append myWord8List)

myWord8List = "I look like a String, but I'm actually a ByteString"

Note : myWordList type is inferred by the compiler.

If you do not use it in bslHandling, then the above declaration will yeld a classical [Char] type.
It does not solve the problem of passing from one specific type to another

Hope it helps

score 0 · Answer 5 · answered Mar 12 '13 at 12:04

0

Maybe you want to do this:

import Data.ByteString.Internal (unpackBytes)
import Data.ByteString.Char8 (pack)
import GHC.Word (Word8)

strToWord8s :: String -> [Word8]
strToWord8s = unpackBytes . pack

answered Mar 12 '13 at 12:04

Znatz

1,530
2
18
31

score -1 · Answer 6 · answered Aug 28 '17 at 22:32

-1

Assuming that Char and Word8 are the same,

import Data.Word ( Word8 ) 
import Unsafe.Coerce ( unsafeCoerce ) 

toWord8 :: Char -> Word8
toWord8 = unsafeCoerce

strToWord8 :: String -> Word8
strToWord8 = map toWord8

answered Aug 28 '17 at 22:32

penkovsky

893
11
14

That is a very bad assumption, given Haskell’s support for Unicode. unsafeCoerce is called unsafe exactly because of things like this. – Evi1M4chine Sep 02 '17 at 22:51
Indeed, Jacob Wang's answer is much better. – penkovsky Sep 03 '17 at 12:30

Best way to convert between [Char] and [Word8]?

6 Answers6