2

For efficiency of my FFI, I want to work directly with buffers holding strings which are produced by C functions, without the copying&conversion to String done by peekCString, as seen for example in fdRead (see the code below).

Are there such types analogous to String implemented? With standard string functions and perhaps the ability to pattern match against string literals.

Perhaps something among the Memory efficient strings in Haskell?

I've already found out that w.r.t. memory management, I'd need a ForeignPtr (to let the garbage collector free unused buffers that have been allocated to hold the data created by foreign functions).

Referenced code examples

fdRead using peekCString:

-- -----------------------------------------------------------------------------
-- fd{Read,Write}

-- | Read data from an 'Fd' and convert it to a 'String' using the locale encoding.
-- Throws an exception if this is an invalid descriptor, or EOF has been
-- reached.
fdRead :: Fd
       -> ByteCount -- ^How many bytes to read
       -> IO (String, ByteCount) -- ^The bytes read, how many bytes were read.
fdRead _fd 0 = return ("", 0)
fdRead fd nbytes = do
    allocaBytes (fromIntegral nbytes) $ \ buf -> do
    rc <- fdReadBuf fd buf nbytes
    case rc of
      0 -> ioError (ioeSetErrorString (mkIOError EOF "fdRead" Nothing Nothing) "EOF")
      n -> do
       s <- peekCStringLen (castPtr buf, fromIntegral n)
       return (s, n)

-- | Read data from an 'Fd' into memory.  This is exactly equivalent
-- to the POSIX @read@ function.
fdReadBuf :: Fd
          -> Ptr Word8 -- ^ Memory in which to put the data
          -> ByteCount -- ^ Maximum number of bytes to read
          -> IO ByteCount -- ^ Number of bytes read (zero for EOF)
fdReadBuf _fd _buf 0 = return 0
fdReadBuf fd buf nbytes =
  fmap fromIntegral $
    throwErrnoIfMinus1Retry "fdReadBuf" $
      c_safe_read (fromIntegral fd) (castPtr buf) nbytes

foreign import ccall safe "read"
   c_safe_read :: CInt -> Ptr CChar -> CSize -> IO CSsize
Community
  • 1
  • 1
imz -- Ivan Zakharyaschev
  • 4,921
  • 6
  • 53
  • 104
  • 3
    If you're considering `CString`s, the closest higher level data type is probably bytestring, in particular the use of the [Data.ByteString.Unsafe](http://hackage.haskell.org/package/bytestring-0.10.4.1/docs/Data-ByteString-Unsafe.html) module. – bheklilr Feb 17 '15 at 15:04
  • @bheklilr Thanks! [Low level interaction with CStrings](http://hackage.haskell.org/package/bytestring-0.10.4.1/docs/Data-ByteString-Unsafe.html#g:2) looks similar to what I need. I will explore more whether there are "safe" analogues to `unsafePackMallocCString` (because if something is called "unsafe", I assume there is a safer, but less efficient variant), and whether things similar to "Low level interaction with CStrings" can be done with `Text` (if I want to correctly work with Unicode)... – imz -- Ivan Zakharyaschev Feb 17 '15 at 15:16
  • Since you're doing low level programming this is one of the times when unsafe functions are ok to use, but your overlying API should make their use safe. For example, you could use `unsafePackMallocCString` to create the `ByteString`, but never allow the pointer to escape the scope of whatever function uses `unsafePackMallocCString`. The worry is that the underlying memory can be modified after passing the ByteString to a pure function, thus breaking purity, but if you never let the pointer get somewhere that could poke to it then you'd be safer. – bheklilr Feb 17 '15 at 15:20
  • Alternatively, you could write a single module in your project for a `ReadOnlyCString` or something similar that uses a newtype wrapper around the Foreign type you're using, but only exports functions for reading the values at that memory location. It'd be tricky, but it's possible. – bheklilr Feb 17 '15 at 15:22
  • @bheklilr And what if wrap `unsafePackMallocCString cstr` into `unsafePerfmormIO`? Because I understand that given a constant `cstr`, the result will be always the same (from the point of view of `String`-programming), no matter how many times I call it. May I hope that the resulting `ByteString` will be correctly garbage-collected and freed whenever there are no more potential uses of it in the computation?.. – imz -- Ivan Zakharyaschev Feb 17 '15 at 15:31
  • I can't comment on the GC, but I'll say that the use of `unsafePerformIO` can be appropriate when properly managed. I'm not an expert on this, I'd recommend finding instance of its use in mature libraries (bytestring might be one of those, I believe the ST type is another), but I'm pretty sure that you need a pragma or two like `NOINLINE` and the like to keep GHC from mangling your code causing unexpected behavior. – bheklilr Feb 17 '15 at 15:35
  • @bheklilr Continuing to think over hiding `unsafePackMallocCString`'s IO-ness (with `unsafePerformIO`): well, `unsafePackMallocCString` is presented as an IO, but I could look for examples where the IO is hidden in the implementation of `ByteString` itself (not initialized from a foreign CString), or of `Array` -- they must do a similar thing: allocate a buffer, and present it as Haskell data, and the initialization is not an IO. I'll look into their source code to have some examples... – imz -- Ivan Zakharyaschev Feb 18 '15 at 00:27
  • @bheklilr your comments here seemed on the money. it would be worth adding them as an answer... – sclv Feb 22 '16 at 18:59

0 Answers0