Haskell Bytestring pack/unpack

Question

I still don't get how bytestrings work

import qualified Data.ByteString.Lazy as BS
let x = BS.readFile "somefile.txt" --some large file
let z = ((reverse (BS.unpack x)) !! 2) --do stuff here

I know bytestrings can be used to read large amounts of data ,very quickly and efficiently. But unpacking a packing doesn't make sense.

let z = readArray x 1 --can you read the bytestring like its a array?(something like this)

Can't you just read the data in bytestring form without unpacking? or just unpack a segment of the data?

Could you explain how it all works?(Code examples)

Do you have a more specific example in mind of the operation you'd like to perform? — acfoltzer, Aug 17 '11 at 01:58

score 8 · Accepted Answer · answered Aug 17 '11 at 02:51

8

But packing a unpacking doesn't [make] sense.

Well, it's certainly wasteful.

Can't you just read the data in bytestring form without unpacking?

Do you mean operate on the data without converting it to another form? Sure you can. Exactly how depends on what you want to do. I've used FFI (and later, Data.Vector.Storable) to access the ByteString as a set of Word32's. You can pull out any individual Word8 naturally. I'm sure you've seen ByteString's Haddock documents, but know that other packages consume bytestrings directly (ex: for communicating an image buffer with C code that is called via FFI).
Do you mean "operate on the data without using [Word8] or [Char]"? The binary, cereal, and other packages can be used to parse bytestrings into arbitrary types.

or just unpack a segment of the data?

Sure:

import Data.ByteString as B

getPortion n m = B.unpack . B.take n . B.drop m

answered Aug 17 '11 at 02:51

Thomas M. DuBuisson

64,245
7
109
166

would `getPortion n m = B.unpack . B.take n . B.drop m` be loading everything in to the memory? – ArchHaskeller Aug 17 '11 at 03:07
1

@Haskellers For strict bytestrings? Yes this would load the entire file. If you imported `Data.ByteString.Lazy` then it would load only the first `n+m` bytes (rounded up to the nearest 32K boundary). – Thomas M. DuBuisson Aug 17 '11 at 16:11
1

@Haskeller To convert to a vector the steps are `ByteString -> ForeignPtr` and `ForeignPtr -> Vector`. That is: `convertBStoVector = let (ptr,off,len) = B.toForeignPtr in V.unsafeFromForeignPtr ptr off len` Where you've done `import Data.ByteString.Internal as B` and `import Data.Vector.Storable as V` (see the bytestring and vector packages). – Thomas M. DuBuisson Aug 17 '11 at 16:15
It seems like this conversion should be libraryized, no? Although I suppose it'd be wasteful to add dependencies to either `bytestring` or `vector`. – acfoltzer Aug 17 '11 at 20:00
@acfoltzer The other option is to add a package. The reason I won't is people who would need the package won't know about/find it while people who would learn about it (via p.h.o, h.r.c, cafe, etc) would probably just write the code themselves instead of add a dependency for a one or two function library. – Thomas M. DuBuisson Aug 17 '11 at 20:07
@Thomas agreed; it's too small of a snippet to justify packaging. – acfoltzer Aug 17 '11 at 20:17
I'm here four years later, and given that it's still difficult to efficiently get `Word32`s out of a `ByteString`, I wish this were part of the `bytestring` library. – Andrew Thaddeus Martin May 09 '16 at 13:26
@AndrewThaddeusMartin Then consider this an opportunity to make a bytestring-extract package. I think a package that will, with some checks for safety, expose `(...,Integral i) => convert :: ByteString -> Vector.Storable.Vector i` would be a great benefit to the community. I say this with the belief that we aren't about to kill off bytestring and all start using `Vector Word8` as I feel we should. – Thomas M. DuBuisson May 09 '16 at 18:27
@ThomasM.DuBuisson Interestingly, it appears that someone has already written this as the [spool](https://hackage.haskell.org/package/spool-0.1/docs/Data-Vector-Storable-ByteString.html) library. The alignment issue mentioned in the haddocks is a little bit of a shame though, but I guess it's unavoidable. – Andrew Thaddeus Martin May 09 '16 at 20:00
@AndrewThaddeusMartin I'd argue that alignment and endian issues are both avoidable. 1. Check the bytestring length. 2. Bytestring.take the maximum length that is (`mod`p ~ 0). 3. Apply an endian-adjustment (`id` or `map swap` depending on architecture test), also allow the user to specify if the original bytestring has big or little endian data. 4. Provide a helper for the recursive case that accepts a bytestring of `length < p` to allow for combining overflow with a following chunk 5. Make a bytestring.lazy version that produces a list of storable vectors. – Thomas M. DuBuisson May 09 '16 at 20:37
@ThomasM.DuBuission I like your thoughts about endianness, chunking, and lazy btyestrings, and I agree that those would be helpful features. Enough that I may actually write something to do this. The alignment issue I'm talking about isn't just a problem with a bytestring being the wrong length. It's an issue with the bytestring starting in the wrong place in memory which [can cause slowdown when reading from the array](http://www.alexonlinux.com/aligned-vs-unaligned-memory-access). The only way to fix that would be to copy the whole buffer. But maybe they're usually 8 byte aligned anyway. – Andrew Thaddeus Martin May 10 '16 at 12:27
@AndrewThaddeusMartin Right, I was thinking "endian" when you said "alignment". Alignment is a frustrating issue and you could do a check using FFI then a copy. It doesn't much matter how bytestrings are aligned by default since `ByteString.drop 1` will increment the offset and keep the same pointer, leaving you with unaligned memory again. – Thomas M. DuBuisson May 10 '16 at 13:36

Haskell Bytestring pack/unpack

1 Answers1