6

This is one of those questions that seem easy at first, but I've been researching for a while now and can't find the answer....

I need to convert a list of bytes (ie- Word8s) to a number of arbitrary length (ie- an Integer). For example

intPack::[Word8]->Integer
intPack [1] = 1
intPack [1, 0] = 256
showHex (intPack [1,2,3,4,5,6,7,8,9,10,11]) "" = "102030405060708090a0b"

A slow solution is easy to write (see the answers in How to convert a ByteString to an Int and dealing with endianness?)

intPack = foldl (\v -> ((v*256) +)) 0

....But I cringe at this, all the extra multiplications and additions, plus a string of useless Integers created in the middle, just to (probably) get the same bytes I started with packed into the internal structures of the Integer type.

Of course, I don't know the details of how Integer does store its data (perhaps it does something more complicated than hold the bytes in a variable length array.... like use flags to denote the length of the number, like utf-8 does when encoding chars). At least it would be good to know that the intPack above is as good as it gets.... Then I could stop my researching, bite (or rather byte :) ) the bullet, and move on.

Community
  • 1
  • 1
jamshidh
  • 12,002
  • 17
  • 31
  • I'm interested in seeing a good answer for this problem, but shouldn't you be using `ByteString`s instead of lists if you're worried that much about performance? Also, Haskell's `Integer` type is implemented by the `GMP` library. – bheklilr Sep 15 '14 at 18:27
  • I suppose I should look at how `GMP` does it.... I actually am using ByteString in the real problem, but simplified for the question here. Any sol'n can easily convert using `B.pack`, `B.unpack`, so I don't think it matters, but if it does, then let it be known that all those `[Word8]` above really are `ByteString`s. – jamshidh Sep 15 '14 at 18:33
  • http://stackoverflow.com/questions/3242256/how-does-gmp-stores-its-integers-on-an-arbitrary-number-of-bytes gives some info about how the `GMP` lib does it, although it still isn't clear to me whether I can use the info. At least it seems to store the value as bytes, so it still seems to be possible to do this efficiently. – jamshidh Sep 15 '14 at 18:44

1 Answers1

3

I would look at the binary package for efficiently packing and unpacking binary data structures:

https://hackage.haskell.org/package/binary-0.7.2.1/docs/Data-Binary-Get.html

Some ideas:

  1. See if the Binary Integer instance can work for you:

    import Data.Binary
    import qualified Data.ByteString.Lazy.Char8 as LBS
    main = do
      let i = 0x0102030405060708090a0b0c0d0e0f :: Integer
          bs = encode i
      print ("before", i)
      LBS.writeFile "output" bs
      j <- fmap decode $ LBS.readFile "output" :: IO Integer
      print ("after", j)
    
  2. Have a look at the definitions of functions like word64be to see if it gives you any ideas:

http://hackage.haskell.org/package/binary-0.7.2.1/docs/src/Data-Binary-Get.html#getWord64be

ErikR
  • 51,541
  • 9
  • 73
  • 124
  • I've been using it elsewhere, but it only (seems) to work for values of fixed size, (8 bit, 16 bit, 32 bit, 64 bit). I need arbitrary length. – jamshidh Sep 15 '14 at 18:42
  • @jamshidh There is certainly an `Integer` instance of `Binary`, but looking at [the code](https://hackage.haskell.org/package/binary-0.7.2.1/docs/src/Data-Binary-Class.html#Binary) it seems to be based on similar folding to what you have already thought of, although using the more efficient `Data.Bits` functions rather than addition and multiplication. – Ørjan Johansen Sep 15 '14 at 21:01
  • I stand corrected about the fixed size.... decode/encode do convert `Integer` to/from `[Word8]`. I got excited about this until I realized it isn't doing a simple pack though, if you look at the `[Word8]` coming out, sometimes it works (ie- 0x0102 yields [0,0,0,1,2], if you ignore the padding), somtimes it totally fails (0x0102030405 yields [1,1,0,0,0,0,0,0,0,5,5,4,3,2,1]! Ugh). Since I am reading off the wire and can't set the encoding, I am out of luck. – jamshidh Sep 15 '14 at 21:04