6

I want to implement a function in C++ via Haskell FFI, which should have the (final) type of String -> String. Say, is it possible to re-implement the following function in C++ with the exact same signature?

import Data.Char
toUppers:: String -> String
toUppers s = map toUpper s

In particular, I wanted to avoid having an IO in the return type because introducing the impurity (by that I mean the IO monad) for this simple task is logically unnecessary. All examples involing a C string I have seen so far involve returning an IO something or Ptr which cannot be converted back to a pure String.

The reason I want to do this is that I have the impression that marshaling is messy with FFI. Maybe if I can fix the simplest case above (other than primitive types such as int), then I can do whatever data parsing I want on the C++ side, which should be easy.

The cost of parsing is negligible compared to the computation that I want to do between the marshalling to/from strings.

Thanks in advance.

thor
  • 21,418
  • 31
  • 87
  • 173
  • Can you provide some more details of what you want to accomplish? From The RWH http://book.realworldhaskell.org/read/interfacing-with-c-the-ffi.html "However, if we know the C code is pure, why don't we just declare it as such, by giving it a pure type in the import declaration? For the reason that we have to allocate local memory for the C function to work with, which must be done in the IO monad, as it is a local side effect. Those effects won't escape the code surrounding the foreign call, though, so when wrapped, we use unsafePerformIO to reintroduce purity." – Jonke Jun 03 '13 at 06:53
  • @Jonke: To be more specific, I wanted to do simple computations (for C++) such as solve sets of linear equations in C++. The solution needs to be done in C++. That's why I want to using String or equivalent (ultimately) to transfer data across FFI. So I am looking for an example of marshalling String or equivalents to the C++ world without introducing an IO. BTW, hmatrix didn't work for me as I use windows/mingw. So I figured the most reliable solution is to work out a working FFI source as I described above. – thor Jun 03 '13 at 21:25
  • Well, I think that your stuck in C land. If you want to transfer an array (vector) of integers or doubles from haskell to C/C++ and back that will have a different signature than transfer C chars. And a string in Haskell is quite different from a C char[]. – Jonke Jun 04 '13 at 07:45

1 Answers1

7

You need to involve IO at least at some point, to allocate buffers for the C-strings. The straightforward solution here would probably be:

import Foreign
import Foreign.C
import System.IO.Unsafe as Unsafe

foreign import ccall "touppers" c_touppers :: CString -> IO ()
toUppers :: String -> String
toUppers s =
  Unsafe.unsafePerformIO $
    withCString s $ \cs ->
      c_touppers cs >> peekCString cs

Where we use withCString to marshall the Haskell string into a buffer, change it to upper-case and finally un-marshall the (changed!) buffer contents into the new Haskell string.

Another solution could be to delegate messing with IO to the bytestring library. That could be a good idea anyways if you are interested in performance. The solution would look roughly like follows:

import Data.ByteString.Internal

foreign import ccall "touppers2" 
  c_touppers2 :: Int -> Ptr Word8 -> Ptr Word8 -> IO ()
toUppers2 :: ByteString -> ByteString
toUppers2 s =
  unsafeCreate l $ \p2 -> 
    withForeignPtr fp $ \p1 ->
      c_touppers2 l (p1 `plusPtr` o) p2
 where (fp, o, l) = toForeignPtr s

This is a bit more elegant, as we now don't actually have to do any marshalling, just convert pointers. On the other hand, the C++ side changes in two respects - we have to handle possibly non-null-terminated strings (need to pass the length) and now have to write to a different buffer, as the input is not a copy anymore.


For reference, here are two quick-and-dirty C++ functions that fit the above imports:

#include <ctype.h>
extern "C" void touppers(char *s) {
    for (; *s; s++) *s = toupper(*s);
}
extern "C" void touppers2(int l, char *s, char *t) {
    for (int i = 0; i < l; i++) t[i] = toupper(s[i]);
}
Peter Wortmann
  • 2,272
  • 14
  • 14
  • Thanks for the answer. Could you show how to do the conversion to upper case on the C/C++ side? I was looking to "re-implement" the toUppers function in C++ (as in the question) because what I am really interested in is to have C++ do the work in parsing a string from Haskell, do some computation (toUppers in this case as a trivial example), and pass the result back in a string to Haskell. Thanks a lot. – thor Jun 04 '13 at 05:59
  • Sure, here you go - nothing too spectacular or polished, but it should get you going. And depending on what you mean with "parsing", you should give Haskell parsing libraries like `parsec` or `attoparsec` a try. Haskell can be very fast at this sort of thing as long as you use `ByteString` or `Text` for your strings. – Peter Wortmann Jun 04 '13 at 10:26
  • Thanks. BTW, are the above two functions lazy or is there any difference between them and (map toUpper), other than efficiency? – thor Jun 04 '13 at 16:19
  • i.e., Is it safe to replace (map toUpper) with the FFI implementation to get the same behavior? Also, I tried to load this on windows/mingw. "ghc --make strFFI.hs touppers.cpp" worked, but "ghci strFFI.hs touppers.o" or "ghci strFFI.hs -ltouppers" does not work. ghci says: final link ... ghc.exe: touppers.o: unknown symbol `__imp_toupper'. Any pointers? – thor Jun 04 '13 at 16:27
  • No, they aren't lazy at all - using the new `toUpper` on an infinite list will not terminate. The error sounds like you might be missing a library, maybe adding "-lstdc++" helps? – Peter Wortmann Jun 04 '13 at 18:01
  • It seems to be related to the MinGW64 platform I use. http://hackage.haskell.org/trac/ghc/ticket/7097 Quoting that: " .. If that prototype had __declspec(dllimport) attribute attached, a C compiler generated a call against __imp_foo, otherwise - a call against bare foo. GHC FFI of modern era (with no C backend) always generates the second variant AFAIK. " – thor Jun 05 '13 at 11:07
  • (continued) "This also explains the difference between 32 and 64 GHCs for Windows. 64-bit GHC is distributed with mingw-w64 headers which use __declspec(dllimport) extensively, and many packages fail to compile. 32-bit GHC is distributed with mingw headers which has almost (or entirely?) no __declspec(dllimport) declared APIs. To make 64-bit GHC usable on Windows I manually removed all __declspec(dllimport) from mingw-w64 headers and now it all works smoothly." – thor Jun 05 '13 at 11:07
  • I can confirm now, that the link issue is with ghci in 64-bit Windows GHC (7.6.3). When I switched to Haskell Platform 2013.2, the problems went away. – thor Jun 06 '13 at 00:21
  • One additional question, could you give an example where the length of the returned string is not known in advance? Thanks a lot. – thor Jun 06 '13 at 00:32
  • Not easily - buffer management becomes a *lot* trickier. If you have at least a bound, you can use `unsafePerformIO . createAndTrim` instead of `unsafeCreate`. Otherwise you would have to either make two calls to C (one for the length, one for copying data to the buffer) or allocate the buffer on the C side and then set up a `ForeignPtr` finalizer to free it once the string gets garbage-collected. If you need more information, it would probably best to make another question out of it. – Peter Wortmann Jun 06 '13 at 15:03