How much does it cost for Haskell FFI to go into C and back?

Question

If I want to call more than one C function, each one depending on the result of the previous one, is it better to create a wrapper C function that handles the three calls? Will it cost the same as using Haskell FFI without converting types?

Suppose I have the following Haskell code:

foo :: CInt -> IO CInt
foo x = do
  a <- cfA x
  b <- cfB a
  c <- cfC c
  return c

Each function cf* is a C call.

Is it better, in terms of performance, to create a single C function like cfABC and make only one foreign call in Haskell?

int cfABC(int x) {
   int a, b, c;
   a = cfA(x);
   b = cfB(a);
   c = cfC(b);
   return c;
}

Haskell code:

foo :: CInt -> IO CInt
foo x = do
  c <- cfABC x
  return c

How to measure the performace cost of a C call from Haskell? Not the cost of the C function itself, but the cost of the "context-switching" from Haskell to C and back.

I'm not at all sure, but I found [this blog post](http://blog.ezyang.com/2010/07/safety-first-ffi-and-threading/) enlightening. If I interpret it correctly, `foreign ccall unsafe` (with `unsafe` being the key), is essentially as cheap as an inline C function call. *However*, great care has to be taken when using `unsafe`, and the safe variant (`foreign ccall`) costs more and involves taking locks. — gspr, Jan 25 '13 at 10:43
@ThiagoNegri: I did some crude (non-Criterion) benchmarks to compare `foreign ccall` and `foreign ccall unsafe`. I have a C function that, given `double x` returns `sin(x)*sin(x)*cos(x)/2.0`. I compiled it with GCC 4.7.2 and -O2. The benchmark calls it with 100000000 different arguments from 0 to pi/2 and sums the results. With `foreign ccall` it ran in about 9.6 seconds, compared to 4.6 seconds for `foreign ccall unsafe`. Calling it from an actual C program gave a running time of 4.4-4.5 seconds. This gives you some idea, at least. The Haskell code was compiled with GHC 7.4.2. — gspr, Jan 25 '13 at 17:48
@gspr: Forget I said anything. My knowledge of the FFI is insufficient. — Colin Woodbury, Jan 25 '13 at 22:47
Why don't you benchmark it to find out? That's the best way to talk about Haskell performance IME — , Apr 13 '18 at 23:22

score 20 · Accepted Answer · answered Jan 25 '13 at 18:32

The answer depends mostly on whether the foreign call is a safe or an unsafe call.

An unsafe C call is basically just a function call, so if there's no (nontrivial) type conversion, there are three function calls if you make three foreign calls, and between one and four when you write a wrapper in C, depending on how many of the component functions can be inlined when compiling the C, since a foreign call into C cannot be inlined by GHC. Such a function call is generally very cheap (it's just a copy of the arguments and a jump to the code), so the difference is small either way, the wrapper should be slightly slower when no C function can be inlined into the wrapper, and slightly faster when all can be inlined [and that was indeed the case in my benchmarking, +1.5ns resp. -3.5ns where the three foreign calls took about 12.7ns for everything just returning the argument]. If the functions do something nontrivial, the difference is negligible (and if they're not doing anything nontrivial, you'd probably better write them in Haskell to let GHC inline the code).

A safe C call involves saving some nontrivial amount of state, locking, possibly spawning a new OS thread, so that takes much longer. Then the small overhead of perhaps calling one function more in C is negligible compared to the cost of the foreign calls [unless passing the arguments requires an unusual amount of copying, many huge structs or so]. In my do-nothing benchmark

{-# LANGUAGE ForeignFunctionInterface #-}
module Main (main) where

import Criterion.Main
import Foreign.C.Types
import Control.Monad

foreign import ccall safe "funcs.h cfA" c_cfA :: CInt -> IO CInt
foreign import ccall safe "funcs.h cfB" c_cfB :: CInt -> IO CInt
foreign import ccall safe "funcs.h cfC" c_cfC :: CInt -> IO CInt
foreign import ccall safe "funcs.h cfABC" c_cfABC :: CInt -> IO CInt

wrap :: (CInt -> IO CInt) -> Int -> IO Int
wrap foo arg = fmap fromIntegral $ foo (fromIntegral arg)

cfabc = wrap c_cfABC

foo :: Int -> IO Int
foo = wrap (c_cfA >=> c_cfB >=> c_cfC)

main :: IO ()
main = defaultMain
            [ bench "three calls" $ foo 16
            , bench "single call" $ cfabc 16
            ]

where all the C functions just return the argument, the mean for the single wrapped call is a bit above 100ns [105-112], and for the three separate calls around 300ns [290-315].

So a safe c call takes roughly 100ns and usually, it is then faster to wrap them up into a single call. But still, if the called functions do something sufficiently nontrivial, the difference won't matter.

The doc says that an `unsafe` call stops all other Haskell threads. Do you know why? — Thiago Negri, Jan 25 '13 at 18:54
Where does it say that? In the [users guide](http://www.haskell.org/ghc/docs/7.6.1/html/users_guide/ffi-ghc.html#id683017) I read that a `safe` foreign call will stop all other Haskell threads when the programme was linked without `-threaded`. I don't see why an `unsafe` foreign call [which confusingly means that calling it is safe in the sense that it needs no precautions] should do that. — Daniel Fischer, Jan 25 '13 at 19:13
The post from Edward Z. Yang ["Safety first: FFI and threading"](http://blog.ezyang.com/2010/07/safety-first-ffi-and-threading/) says that an `unsafe` foreign call cannot be preempted from the Haskell RTS. The user guide says: "if you need to make a foreign call to a function that takes a long time or blocks indefinitely, then you should mark it `safe` and use `-threaded`." I guess the author tought that "if you use `unsafe`, it will block even if you use `-threaded`" was implicit. — Thiago Negri, Jan 25 '13 at 19:28
I think an `unsafe` call can't be preempted because it is inlined in the RTS. So the RTS cannot go forward and give control to another Haskell thread untill this call returns. And probably the `safe` call may open a new OS thread when no thread is available for making the foreign call and keep the Haskell RTS Scheduler going. — Thiago Negri, Jan 25 '13 at 19:37
Ah, in the docs for `forkOS`: "To allow foreign calls to be made without blocking all the Haskell threads (with GHC), it is only necessary to use the -threaded option when linking your program, _and to make sure the foreign import is not marked unsafe_." (emphasis mine). I still don't know why the RTS shouldn't be able to run other threads in parallel to one with an `unsafe` call in principle, but GHC's implementation at least can't. An `unsafe` foreign call can't be pre-empted because GHC's scheduler only steps in when a thread allocates, and `unsafe` calls don't alloc (as far as GHC knows). — Daniel Fischer, Jan 25 '13 at 20:03
If I recall correctly, what the unsafe call blocks is not the _entire_ rts, but rather the IO manager component. So back when HDBC-odbc used unsafe calls, a long-running database query would halt the ability of my app to accept new incoming connections over the network, but not (if my memory holds) the ability to compute, or to write to stdout or stderr. — sclv, Jan 31 '13 at 03:20

score -4 · Answer 2 · answered Jan 25 '13 at 14:06

-4

That probably depends very much on your exact Haskell compiler, the C compiler, and the glue binding them together. The only way to find out for sure is to measure it.

On a more philosophical tune, each time you mix languages you create a barrier for newcommers: In this case it isn't enough to be fluent in Haskell and C (that already gives a narrow set), but you also have to know the calling conventions and whatnot enough to work with them. And many times there are subtle issues to handle (even calling C from C++, which are very similar languages isn't at all trivial). Unless there are very compelling reasons, I'd stick with a single language. The only exception I can think of offhand is for creating e.g. Haskell bindings to a preexisting complex library, something like NumPy for Python.

answered Jan 25 '13 at 14:06

vonbrand

11,412
8
32
52

6

I feel this misses the question asker's point. He's asking "what's the performance costs involved in doing x?", and your answer seems to boil down to "hard to say, but don't do x, because it'll be harder for people to understand your code". – gspr Jan 25 '13 at 14:31
@gspr: No. I said the only way to make sure is to measure _for his exact setup_, there are too many variables. But he asks about performance, and (after reading Bentley's "Writing efficient programs" and "Programming pearls", and also Kernighan and Pike's "The practice of programming", and other books by the Unix crowd) I'm firmly convinced that _people time_ is much more expensive than _computer time_, except for very narrow situations. So Knuth's dictum that "Premature optimization is the root of all evil" rings true. – vonbrand Jan 25 '13 at 14:54
Sure, but I don't think it's unreasonable to assume (for the sake of the question) that the author has concluded that he *does* need to call into C from Haskell, and so he came to ask about what costs are involved and how he should structure his calls. That being said, I guess my downvote was a bit too much. Sadly, it's been locked in now. – gspr Jan 25 '13 at 15:02
@gspr, in my experience (copiously corroborated by the literature I cite) people are _notoriously_ bad at guessing where the true costs lie, and moreover many (almost all?) are prone to rush into some microoptimization quest on the first hint of performance problems. It may be well be that OP _does_ understand all this (in which case I'd love an explanation/defense with details), but there are many other not so enlightened readers here. I consider my audience to go much farther than just the few people asking, commenting, and answering. – vonbrand Jan 25 '13 at 16:14
As others said, your answer has nothing to do with my question. The question is: "What is the cost? How to measure it?". You didn't say what is the cost and, if it depends on the environment, neither how to measure it. – Thiago Negri Jan 25 '13 at 17:02
@ThiagoNegri, write a short program that does a loop doing nothing (as baseline), and then the same loop doing whatever operation interests you. Substract the time for the empty loop from the "filled" loop, and divide by the number of iterations. Run it a few times, to make sure the results are consistent. For extensive benchmarking of C operators look at http://www.cs.bell-labs.com/cm/cs/pearls/appmodels.html – vonbrand Jan 25 '13 at 17:09
@vonbrand: In Haskell, one tends to benchmark using [criterion](http://hackage.haskell.org/package/criterion-0.6.2.0). And the link you gave doesn't seem to help with regards to *the cost of calling C functions from Haskell*. – gspr Jan 25 '13 at 17:13
@gspr, I know this is no Haskell benchmark, but it might give an idea of the C side for comparison. – vonbrand Jan 25 '13 at 17:40

How much does it cost for Haskell FFI to go into C and back?

2 Answers2