Using base64-bytestring with lazy ByteStrings

Question

Here's what I'm trying to do in Haskell:

take a message in ByteString format (doesn't really matter if lazy or strict)
encrypt the message with an RSA public key
base64 encode the encrypted message

The RSA library that I'm using handles lazy ByteStrings internally. The Base64 library, however, uses strict ByteStrings only. My application uses lazy ByteStrings to send the messages to network sockets.

So, it looks like I have to convert between lazy and strict ByteStrings. Here's what I do:

encrypt :: CryptoRandomGen t => t -> RSA.PublicKey -> L.ByteString -> L.ByteString
encrypt gen pubkey msg = do
  let (ciphertext,_) = RSA.encrypt gen pubkey msg
  (L.fromChunks . map encode . L.toChunks) $ ciphertext

decrypt :: RSA.PrivateKey -> L.ByteString -> Either String L.ByteString
decrypt privkey ciphertext = do
  dec <- decode $ S.concat $ L.toChunks ciphertext
  return $ RSA.decrypt privkey $ L.fromChunks [dec]

Unfortunately, sometimes this fails. When I decrypt a message encrypted in this way it sometimes results in garbage followed by the actual message. I'm not sure exactly where the problem is: is it the conversion from lazy to strict ByteStrings or is it the base64 encoding step? Or is it both?

Lazy ByteStrings are just lists of strict ByteString chunks. Do I implicitly modify the length of the message by converting it?

Please enlighten me.

A lazy bytestring is not a monadic value, so how come you're using do notation? — dave4420, Apr 14 '12 at 16:12

hammar · Accepted Answer · 2012-04-14T22:06:06.963

4

The problem is that base64 encoding maps every three bytes (3 × 8 bits) of input to four bytes (4 × 6 bits) of output, so when the size of the input is not a multiple of three, it has to add padding. This means that concatenating the result of encoding each chunk separately may not give the same result as encoding the entire thing.

> encode "Haskell"
"SGFza2VsbA=="
> encode "Hask" `append` encode "ell"
"SGFzaw==ZWxs"

Note that these are different even if you remove the = characters used to pad the output. The padding of the input will still cause problems.

Your best bet is probably to find a library that supports lazy bytestrings, but ensuring that the sizes of all chunks (except the last) are multiples of three can work as a workaround.

Alternatively, if you don't mind keeping the whole thing in memory, convert the lazy bytestring to a strict one, encode the whole thing in one step, and convert back (if necessary).

edited Apr 14 '12 at 22:06

answered Apr 14 '12 at 15:29

hammar

138,522
17
304
385

The lack of lazy bytestring support does seem like a somewhat glaring omission from that library, though. If someone would like to hack on it, I'm sure Bryan would be happy to accept a patch. – hammar Apr 14 '12 at 16:14
There's an implementation for lazy bytestrings in OpenSSL.EVP.Base64. I compared the output of `encode` in base64-bytestring on a converted bytestring with that of `encodeBase64LBS` in HsOpenSSL on the same message but as a lazy bytestring and I could not see any difference, though. – Apr 15 '12 at 02:42
@rekado: Erm, yes? Not sure I follow. I would expect both of those to work just fine. It's `L.fromChunks . map encode . L.toChunks` that's broken. – hammar Apr 15 '12 at 02:52
yes, I see that. Thanks for pointing it out. In my particular case the message size is always 32 bytes, so there is no difference between `L.fromChunks . map encode . L.toChunks` and the more verbose `L.fromChunks [(encode . S.concat . L.toChunks) $ ciphertext]`. I should change it, though, in case the message length changes in the future. – Apr 15 '12 at 06:28
It seems that my problem is rather to be found in my use of the encryption function. Only for certain random seeds is garbage produced when the message size and contents are fixed. Your answer is correct, though, given the way I asked my question. – Apr 15 '12 at 06:30

Using base64-bytestring with lazy ByteStrings

1 Answers1