4

I spent quite a lot of time troubleshooting an issue I had in the application I am working on. This application is a web app, exposing REST endpoints using scotty. It uses a TVar to hold its state which is updated through STM a actions triggered by the front-end layer.

As this application is based on event sourcing principles, any event generated by business layer after STM transactions complete is stored into an EventStore (currently a simple flat file...). Here is the relevant code fragment:

newtype (EventStore m) => WebStateM s m a = WebStateM { runWebM :: ReaderT (TVar s) m a }

  deriving (Functor,Applicative,Monad, MonadIO, MonadTrans, MonadReader (TVar s))


    applyCommand :: (EventStore m, Serializable (Event a)) =>
                    Command a                                    
                 -> TVar s
                 -> WebStateM s m (Event a) 
    applyCommand command = \ v -> do
      (e, etype :: EventType s) <- liftIO $ atomically $ actAndApply v
      storeEvent e etype
      return e
      where
        actAndApply =  \ v -> do
          s <- readTVar v
          let view = getView s
          let e  = view `act` command
          let bv = view `apply` e

          modifyTVar' v (setView bv)

          return (e, getType view)

This works perfectly, until a bug slipped in the storeEvent function. This function is responsible for serialising the event with the appropriate type, and I made a (gross) mistake in my serialisation routine for some type which lead to an infinite loop! Then all of a sudden, my cabal test began to hang and fail with a timeout (I use wreq as client library to test REST services). It took me a couple of hours to pin down the actual error on the server side: tests: thread blocked indefinitely in an STM transaction. Suspecting the serialisation routine, it took me another couple of hours to nail down the culprit and fix the issue.

Although I am of course entirely responsible for the error (I should have tested more thoroughly my serialisation routine!), I found it quite misleading. I would like to understand better where this error comes from and how to prevent it. I have read Edward Yang's post on the the subject, and this mail thread but I must confess the logical chain of events leading to observing this error is not entirely clear to me.

I think I understand the thread calling applyCommand, which is spawned by scotty, dies from some exception (stack exhausted?) launched while evaluating storeEvent, but I do not understand how this is related to the transaction being garbage.

insitu
  • 4,488
  • 3
  • 25
  • 42

1 Answers1

3

The exception says that one thread tried to do a transaction, and hit retry, which will rerun the transaction when something changes. But the thing it's waiting for changes on is no longer referenced anywhere, so the retry can never happen. And that's a bug. Basically that thread is hung now.

I would imagine that some thread somewhere was supposed to update this TVar, but it died because of an exception, thereby dropping the last reference to that TVar and provoking the exception.

That's what I think happened. Without seeing the entire application, it's difficult to be sure.

MathematicalOrchid
  • 61,854
  • 19
  • 123
  • 220
  • Thanks for the insight. I can understand that if the 'main' thread dies then the `TVar` is no longer referenced, however what I do not understand is where the `retry` is supposed to be. I do not use `retry` in my code explicitly, so this must be implicit... – insitu Sep 30 '14 at 12:47
  • Do you use `TChan` or something? That will use `retry` internally to block until data is available from the channel. – MathematicalOrchid Sep 30 '14 at 13:58
  • That's what is puzzling me: I do not use `TChan`, only `readTVar`, `writeTVar`, `modifyTVar'`. – insitu Sep 30 '14 at 14:38
  • Reading http://stackoverflow.com/questions/7862372/haskell-thread-blocked-indefinitely-in-an-stm-transaction I see a link to C source code where the exception is thrown, but this does not help me much. Might this have something to do with underlying threading in warp? – insitu Sep 30 '14 at 14:55
  • I think that any STM transaction can call `retry` implicitly (but haven't actually checked the code) – John L Oct 01 '14 at 05:08