4

I am trying to write a storable vector instance for CString (null-terminated C chars in my case). The storable instance will store the pointers that the CString is (Ptr CChar). So, length of the vector is the number of CString pointers. Now, the reason I am writing this storable instance is because it will be used to do zero-copy from FFI CString, and then fast ByteString build using unsafeCreate (after some transformation - so, we use fast vectors here for intermediate operations). In order to do fast ByteString build, three things are needed for storable instance:

  • Total length in bytes - the storable instance needs to have book-keeeping allocation for storing the length of each CString when adding it to the vector, and total length of CString stored so far. Let us say total length of C Strings can't exceed 2^31. So, Int32/Word32 will do to store the length of each CString, and total length.
  • Function to store CString and its length - O(n) time. This function will walk the CString, and store its length, and also, increment the total length by the length of the CString.
  • Functon to return length in total bytes - O(1) time. This function will just retrieve value from the field that stores the total length

While I know how to write custom storable instance, I don't know how to handle this kind of case. A simple code (can be a simple toy example) that shows how to do custom bookkeeping, and write function to store/get bookkeeping results will be very much appreciated.

Update 1 (Clarification)

The reason for using storable vector instance in my case is two-fold: fast computation/transformation using unboxed types (on real-time data received through C FFI), and fast conversion to bytestring (to send out the data in real-time over IPC to another program). For fast bytestring conversion, unsafeCreate is excellent. But, we must know how much to allocate, and also, pass it a function for transformation. Given a storable vector instance (with mixed types - I simplified my question above to just CString type), it is easy for me to build a fast transform function that walks each element of the vector and transforms it to bytestring. Then, we simply pass it to unsafeCreate. But, we must also pass it number of bytes to allocate. A O(n) recursive byte length calculation function is too slow, and can double the overhead of building bytestring.

Sal
  • 4,312
  • 1
  • 17
  • 26
  • Is using `unsafePackCString` or something similar from `Data.ByteString.Unsafe` not an option? – Daniel Fischer Dec 11 '11 at 15:43
  • Do you want to make variable-length strings instances of `Storable`? That isn't possible - `sizeOf` isn't allowed to depend on its parameter. – Peter Wortmann Dec 11 '11 at 16:22
  • @Peter, no, I plan to store pointers to string which have constant storage, and the total length of strings, which is also constant. – Sal Dec 11 '11 at 21:54
  • @Daniel, no, unsafePackCString is not an option because I am going to transform from a vector of mixed types (though a custom data type which is union of different data types), to bytestring. – Sal Dec 11 '11 at 21:57
  • So you essentially want a vector of `(Ptr CChar, Int)`? – Peter Wortmann Dec 12 '11 at 17:28
  • I am a bit confused by your usage of "increment". Are you talking about mutable vectors, by any chance? – Peter Wortmann Dec 12 '11 at 17:49
  • @Peter, yes, a Vector of (Int, Ptr CChar) is correct for each CString being stored. But also, I want another Int which stores total length (i.e., sum over all Int in the tuples - that is what I meant by increment). With that Int, I can determine total storage required to copy all CString in the vector to ByteString. – Sal Dec 12 '11 at 19:52
  • Bear in mind that the work to count the string lengths and add them up has to be done at some point; when is a matter of optimisation. If you do it at store time then the cost is invoked per store, and you pay extra if you store the same value multiple times. OTOH if you do it at vector build time (as above) then you pay per operation, and hence pay extra if you keep consing and unconsing. – Paul Johnson Dec 13 '11 at 19:29
  • 1
    Well, you can always determine the total size by summing up the lenghts afterwards - which should be quite fast, if using unboxed vectors. If you want, you can memoize the result, but your application sounds like you need the total size only once anyway? – Peter Wortmann Dec 14 '11 at 09:22
  • @Peter, yes, summing up later is another approach. Since I will be using it in couple of places, memoization is what I need. I prefer to memoize it right away at build time, instead of traversing the vector again after the build (and since it will be of mixed type, each value will have to be inspected again for type which can slow down the traveral because of branching). – Sal Dec 14 '11 at 16:39
  • 1
    @sal: Well, you can "build" and "sum" at the same time with, say, `scanl`. Given that you have an input vector, obviously. – Peter Wortmann Dec 14 '11 at 20:16
  • @Peter, yes, very good point. Now, it has got me thinking... – Sal Dec 16 '11 at 01:14

1 Answers1

1

It sounds like you want to write something like this. Note that this code is not tested.

-- The basic type.  Export the type but not the constructors or 
-- accessors from the module.
data StringVector {
   strVecLength :: Word32,               -- Total length
   strVecContents [(Word32, Ptr CChar)]  -- (Length, value) pairs
}

-- Invariants: forall (StringVector len contents), 
--    len == sum (map fst) contents
--    all (\p -> fst p == c_strlen (snd p)) contents


-- The null case.
emptyStrVec :: StringVector
emptyStrVec = StringVector 0 []


-- Put a new Cstring at the head of the vector.  Analogous to ":".
stringVectorCons :: Ptr CChar -> StringVector -> StringVector
stringVectorCons ptr (StringVector len pairs) = 
   StringVector (len + n) $ (n, ptr) : pairs
   where
      n = c_strlen ptr   -- Or whatever the right function name is


-- Extract the head of the vector and the remaining vector.
stringVectorUncons :: StringVector -> ((Word32, Ptr CChar), StringVector)
stringVectorUncons (StringVector len (h:t)) =
   (h, StringVector (len - fst h) t)

After that you can add whatever other functions you might want, depending on the application. Just make sure that each function preserves the invariants.

Paul Johnson
  • 17,438
  • 3
  • 42
  • 59
  • thanks for example. Doesn't look like this one can be made an instance of Data.Vector.Storable (after replacing list type in strVecContents with a Vector - list is too slow for my case) because it is not fixed length. I am looking to define the type as storable vector. This is because I plan to define a storable vector instance for union types, one of which is CString. – Sal Dec 11 '11 at 22:11
  • Marking this one as the answer as it is closest to the solution. Also, credit to @Peter Wortmann for suggesting "build" and "sum" at the same time. I think it will be tricky but should be doable when copying data from C FFI into Vector since copying is IO action, and so, should permit "sum" operation as well within IO. If casting, that too is IO, and length information will be available at cast time which can be used to calculate total bytes. – Sal Dec 16 '11 at 03:28