I have a series of functions (like some-operation
in the example), which I send
or send-off
to agents:
(defn some-operation [agent-state]
(dosync
(let [updated (foo agent-state)] ;; derive new state from old one
(alter bar whatev updated) ;; reflect the new state in the world
(send *agent* some-operation) ;; "recur"
updated) ;; value for recur
))
(send (agent {}) some-operation)
This approach has worked for me as I was developing my app. But after some changes in the codebase, the agents simply stop running after a while ('a while' being some seconds - a few thousands "recursive" calls).
Their state is valid in the domain, the agents themselves haven't FAILED
, and I am certain that they are not livelocking on their dosync
blocks (one can measure contention).
My suspicon is that the JVM/OS is preventing the underlying executor thread from running, for some or other reason. But I don't know how to check whether this assumption is right.
In general, what are some possible reasons why a send agent might not get its pending "sends" executed? What can I inspect/measure?
Update - given the following modification for debugging...
(defn some-operation [agent-state]
(let [f (future
(dosync
...) ;; the whole operation, as per the previous example
)]
(Thread/sleep 1000) ;; give the operation some time
(if (realized? f)
@f
;; not realized: we deem the operation as blocking indefinetely
(do
(println :failed)
(send *agent* some-operation)
agent-state))))
...the agent still gets stuck, and doesn't even print :failed
.