How come `readIORef` is a blocking operation

Question

This was a complete surprise for me. Can someone explain what is the reason behind readIORef blocking, when there is an atomicModifyIORef in flight? I understand that the assumption is that the modifying function supplied to the latter function is suppose to be very quick, but that is besides the point.

Here is a sample piece of code that reproduces what I am talking about:

{-# LANGUAGE NumericUnderscores #-}
module Main where

import Control.Concurrent
import Control.Concurrent.Async
import Control.Monad
import Data.IORef
import Say (sayString)
import Data.Time.Clock
import System.IO.Unsafe

main :: IO ()
main = do
  ref <- newIORef (10 :: Int)
  before <- getCurrentTime
  race_ (threadBusy ref 10_000_000) (threadBlock ref)
  after <- getCurrentTime
  sayString $ "Elapsed: " ++ show (diffUTCTime after before)


threadBlock :: IORef Int -> IO ()
threadBlock ref = do
  sayString "Below threads are totally blocked on a busy IORef"
  race_ (forever $ sayString "readIORef: Wating ..." >> threadDelay 500_000) $ do
    -- need to give a bit of time to ensure ref is set to busy by another thread
    threadDelay 100_000
    x <- readIORef ref
    sayString $ "Unblocked with value: " ++ show x


threadBusy :: IORef Int -> Int -> IO ()
threadBusy ref n = do
  sayString $ "Setting IORef to busy for " ++ show n ++ " μs"
  y <- atomicModifyIORef' ref (\x -> unsafePerformIO (threadDelay n) `seq` (x * 10000, x))
  -- threadDelay is not required above, a simple busy loop that takes a while works just as well
  sayString $ "Finished blocking the IORef, returned with value: " ++ show y

Running this piece of code produces:

$ stack exec --package time --package async --package say --force-dirty --resolver nightly -- ghc -O2 -threaded atomic-ref.hs && ./atomic-ref
Setting IORef to busy for 10000000 μs
Below threads are totally blocked on a busy IORef
readIORef: Wating ...
Unblocked with value: 100000
readIORef: Wating ...
Finished blocking the IORef, returned with value: 10
Elapsed: 10.003357215s

Note that readIORef: Wating ... is printed only twice, once before blocking and one more time afterwards. This is very unexpected, since it is an action that runs in a totally separate thread. This means that blocking on IORef affects other threads than the one that invoked readIORef, which is even more surprising.

Are those semantics expected, or is it a bug? I fit is not a bug, why is this expected? I'll open a ghc bug later, unless someone has an explanation for this behavior that I can't think of. I won't be surprised that this is some limitation of ghc runtime, in which case I will provide an answer here later. Regardless of the outcome it is very useful to know about this behavior.

Edit 1

The busy loop I tried that does not require unsafePerformIO was requested in comments, so here it is

threadBusy :: IORef Int -> Int -> IO ()
threadBusy ref n = do
  sayString $ "Setting IORef to busy for " ++ show n ++ " μs"
  y <- atomicModifyIORef ref (\x -> busyLoop 10000000000 `seq` (x * 10000, x))
  sayString $ "Finished blocking the IORef, returned with value: " ++ show y

busyLoop :: Int -> Int
busyLoop n = go 1 0
  where
    go acc i
      | i < n = go (i `xor` acc) (i + 1)
      | otherwise = acc

The outcome is exactly the same, except the runtime is slightly different.

Setting IORef to busy for 10000000 μs
Below threads are totally blocked on a busy IORef
readIORef: Wating ...
Unblocked with value: 100000
readIORef: Wating ...
Finished blocking the IORef, returned with value: 10
Elapsed: 8.545412986s

Edit 2

It turns out that sayString was the reason for no output not appearing. Here is what the out is when sayString is swapped for putStrLn:

Below threads are totally blocked on a busy IORef
Setting IORef to busy for 10000000 μs
readIORef: Wating ...
readIORef: Wating ...
readIORef: Wating ...
readIORef: Wating ...
readIORef: Wating ...
readIORef: Wating ...
readIORef: Wating ...
readIORef: Wating ...
readIORef: Wating ...
readIORef: Wating ...
readIORef: Wating ...
readIORef: Wating ...
readIORef: Wating ...
readIORef: Wating ...
readIORef: Wating ...
readIORef: Wating ...
readIORef: Wating ...
readIORef: Wating ...
readIORef: Wating ...
readIORef: Wating ...
Finished blocking the IORef, returned with value: 10
Unblocked with value: 100000
Elapsed: 10.002272691s

That still does not answer the question, why readIORef block. In fact I just stumbled upon a quote from a book "Haskell High Performance" By Samuli Thomasson that tells us that blocking should not happen:

@WillemVanOnsem what about other threads? Why does it affect them? — lehins, May 22 '20 at 15:34
@WillemVanOnsem Also ReadWrite lock mode should not allow multiple writes at the same time, then why `writeIORef` completely cancels the busy `atomicModifyIORef` and unblocks the threads? — lehins, May 22 '20 at 15:37
You say "threadDelay is not required..., a simple busy loop... works just as well". What busy loop did you try? Specifically: did you try one which allocates memory, and therefore enters the runtime, or did both of your tests (the unshown busy loop and this `threadDelay`-based one) use pure computations which never enter the runtime? I would bet that 1. you're using the non-threaded runtime and 2. you've written a pure computation that never yields. — Daniel Wagner, May 22 '20 at 15:57
@DanielWagner > I would bet that 1. you're using the non-threaded runtime You'd loose that bet ;) See the compilation flags in the question it does contain `-threaded`. Behavior is the same with and without it. See the edit with the busy loop I tried. — lehins, May 22 '20 at 16:05
@DanielWagner > 2. you've written a pure computation that never yields You'd loose that bet as well, I know about that peculiarity of GHC run time and I have tried with manual `yield`s, so the fact there are no allocations is not the driver behind this behavior. — lehins, May 22 '20 at 16:07
@DanielWagner I think this is the behavior you are referring to: https://github.com/simonmar/async/issues/93 which is not the case here, because all threads in the example respond to async exceptions just fine and `-fomit-yields` has no affect on the behavior — lehins, May 22 '20 at 16:10
Are you sure you're diagnosing the problem correctly? As far as I know, `atomicModifyIORef'` is an atomic compare and exchange followed by evaluating the returned thunk. The actual blocking portion is the atomic compare and exchange, which should be basically nothing without contention. But [the docs for `sayString`](https://hackage.haskell.org/package/say-0.1.0.1/docs/Say.html#v:sayString) indicate it forces the entire argument before producing any output. *That* will block until evaluation of the thunk completes. — Carl, May 22 '20 at 16:19
@Carl interesting, switching to `print` does solve the printing problem. It still does not answer the question about blocking on `readIORef`. I'll update the question with new info. — lehins, May 22 '20 at 16:23
In additional to replacing `sayString` with `putStrLn`, also break up the last line of `threadBlock` into `putStrLn "Unlblocked"; putStrLn $ "Got value " ++ show x`. That might help you understand what's going on. — K. A. Buhr, May 22 '20 at 16:30
@K.A.Buhr Are you trying to say that it blocks on a thunk `x`, rather than the `readIORef` operation itself? — lehins, May 22 '20 at 16:35

score 1 · Accepted Answer · edited May 22 '20 at 21:28

I think I understand what happens now. TLDR, readIORef is not a blocking operation! Big thanks to everyone who commented on the question.

The way I break down the logic mentally is (same as in question, but with added Thread names):


threadBlock :: IORef Int -> IO ()
threadBlock ref = do
  race_ ({- Thread C -} forever $ sayString "readIORef: Wating ..." >> threadDelay 500_000) $ do
    {- Thread B -}
    threadDelay 100_000
    x <- readIORef ref
    sayString $ "Unblocked with value: " ++ show x

threadBusy :: IORef Int -> Int -> IO ()
threadBusy ref n = do {- Thread A -}
  sayString $ "Setting IORef to busy for " ++ show n ++ " μs"
  y <- atomicModifyIORef' ref (\x -> unsafePerformIO (threadDelay n) `seq` (x * 10000, x))
  sayString $ "Finished blocking the IORef, returned with value: " ++ show y

Thread A updates the content of a ref with a thunk that will be filled when this computation is done unsafePerformIO (threadDelay n) `seq` (x * 10000, x). The important part is that because atomicModifyIORef' is most likely implemented with CAS (compare-and-swap) and the swap was successful, since expected value matched and the new value was updated with the thunk that has not been evaluated yet. Because atomicModifyIORef' is a strict it has to wait until the value is computed, which will take 10 sec before returning. So thread A blocks.
Thread B reads the thunk from ref with readIORef WITHOUT blocking. Now once an attempt is made to print the new content of a thunk x it has to stop and wait until it is filled with a value, which still is in a process of being computed. Because of that it has to wait thus it looks like it is blocked.
Thread C was suppose to be printing a message every 0.5 sec with sayString, but it fails to do so and therefore behaved as it was blocked as well. From a quick look say package and GHC.IO.Handle it looks like a Handle for stdout gets blocked by thread B, because printing in say package suppose to happen without interleaving and for that reason thread C could not do any printing either, thus it looked like it was blocked as well. That is why switching to putStrLn unblocked Thread C and allowed it to print a message every 0.5 sec.

This definitely convinces me, but if anyone has a better explanation I'll be happy to accept another answer.

What happens if you break up `sayString $ "Unblocked with value: " ++ show x` into two uses of `sayString`, with the first one not depending on `x` (and you don't use `putStrLn` anywhere)? — Neil Mayhew, May 22 '20 at 17:59
To verify your assumptions, try an `IORef` that has a non-trivial WHNF like a list: `newIORef []` and ``atomicModifyIORef' ref (\_ -> ([], [1, 2, 3, (unsafePerformIO (threadDelay n) `seq` 4)]))`` or similar should get you interesting results, although you probably want to `readIORef ref >>= mapM_ print` to make sure that the last element doesn't block the former elements on printing. — Zeta, May 22 '20 at 18:31
Your diagnosis matches my understanding of how things are implemented - but I've been wrong before, so those experiments are good ideas. — Carl, May 22 '20 at 19:21
@NeilMayhew exactly what you'd expect. `"Unblocked"` gets printed and then the thread blocks until `x` is fully evaluated. In other words the same as with `putStrLn`. — lehins, May 23 '20 at 05:50

How come `readIORef` is a blocking operation

1 Answers1