12

I'm new to Haskell and I'm trying to use a pure SHA1 implementation in my app (Data.Digest.Pure.SHA) with a JSON library (AttoJSON).

AttoJSON uses Data.ByteString.Char8 bytestrings, SHA uses Data.ByteString.Lazy bytestrings, and some of my string literals in my app are [Char].

Haskell Prime's wiki page on Char types seems to indicate this is something still being worked out in the Haskell language/Prelude.

And this blogpost on unicode support lists a few libraries but its a couple years old.

What is the current best way to convert between these types, and what are some of the tradeoffs?

Thanks!

stites
  • 4,903
  • 5
  • 32
  • 43
cmars232
  • 121
  • 1
  • 3
  • http://hackage.haskell.org/packages/archive/utf8-string/0.3.7/doc/html/Data-ByteString-Lazy-UTF8.html – singpolyma Mar 12 '13 at 13:19
  • Note that a `Char` *cannot* safely be converted to `Word8` because `Char` can store many more values than `Word8`. – singpolyma Mar 12 '13 at 13:20

6 Answers6

6

Here's what I have, without using ByteString's internal functions.

import Data.ByteString as S (ByteString, unpack)
import Data.ByteString.Char8 as C8 (pack)
import Data.Char (chr)

strToBS :: String -> S.ByteString
strToBS = C8.pack

bsToStr :: S.ByteString -> String
bsToStr = map (chr . fromEnum) . S.unpack

S.unpack on a ByteString gives us [Word8], we apply (chr . fromEnum) which converts any Enum type to a character. By composing all of them together we'll the function we want!

Jacob Wang
  • 4,411
  • 5
  • 29
  • 43
4

For conversion between Char8 and Word8 you should be able to use toEnum/fromEnum conversions, as they represent the same data.

For Char and Strings you might be able to get away with Data.ByteString.Char8.pack/unpack or some sort of combination of map, toEnum and fromEnum, but that throws out data if you're using anything other than ASCII.

For strings which could contain more than just ASCII a popular choice is UTF8 encoding. I like the utf8-string package for this:

http://hackage.haskell.org/packages/archive/utf8-string/0.3.6/doc/html/Codec-Binary-UTF8-String.html

Antoine Latter
  • 1,545
  • 10
  • 13
2

Char8 and normal bytestrings are the same thing, just with different interfaces depending on which module you import. Mainly you want to convert between strict and lazy bytestrings, for which you use toChunks and fromChunks.

To put chars into bytestrings, use pack.

Also note that if your chars include codepoints which multibyte representations in UTF-8, then there will be problems.

sclv
  • 38,665
  • 7
  • 99
  • 204
1

Note : This answers the question in a very specific case (calling functions on hard-coded strings).

This may seem a minor problem because conversion functions exist as detailed in previous answers. But I wanted a method to reduce administrative code, i.e. the code that you have to write just to get functions working together.

The solution to reducing type-handling code for strings is to use the OverloadedStrings pragma and import the relevant module(s)

{-# LANGUAGE OverloadedStrings #-}
module Dummy where
import  Data.ByteString.Lazy.Char8 (ByteString, append)

bslHandling :: ByteString -> ByteString
bslHandling = (append myWord8List)

myWord8List = "I look like a String, but I'm actually a ByteString" 

Note : myWordList type is inferred by the compiler.

  • If you do not use it in bslHandling, then the above declaration will yeld a classical [Char] type.

  • It does not solve the problem of passing from one specific type to another

Hope it helps

Titou
  • 968
  • 10
  • 16
0

Maybe you want to do this:

import Data.ByteString.Internal (unpackBytes)
import Data.ByteString.Char8 (pack)
import GHC.Word (Word8)

strToWord8s :: String -> [Word8]
strToWord8s = unpackBytes . pack 
Znatz
  • 1,530
  • 2
  • 18
  • 31
-1

Assuming that Char and Word8 are the same,

import Data.Word ( Word8 ) 
import Unsafe.Coerce ( unsafeCoerce ) 

toWord8 :: Char -> Word8
toWord8 = unsafeCoerce

strToWord8 :: String -> Word8
strToWord8 = map toWord8
penkovsky
  • 893
  • 11
  • 14