4

I have a series of functions (like some-operation in the example), which I send or send-off to agents:

(defn some-operation [agent-state]
  (dosync
   (let [updated (foo agent-state)] ;; derive new state from old one
     (alter bar whatev updated) ;; reflect the new state in the world
     (send *agent* some-operation) ;; "recur"
     updated) ;; value for recur
   ))

(send (agent {}) some-operation)

This approach has worked for me as I was developing my app. But after some changes in the codebase, the agents simply stop running after a while ('a while' being some seconds - a few thousands "recursive" calls).

Their state is valid in the domain, the agents themselves haven't FAILED, and I am certain that they are not livelocking on their dosync blocks (one can measure contention).

My suspicon is that the JVM/OS is preventing the underlying executor thread from running, for some or other reason. But I don't know how to check whether this assumption is right.

In general, what are some possible reasons why a send agent might not get its pending "sends" executed? What can I inspect/measure?

Update - given the following modification for debugging...

(defn some-operation [agent-state]
  (let [f (future
            (dosync
             ...) ;; the whole operation, as per the previous example
            )]
    (Thread/sleep 1000) ;; give the operation some time
    (if (realized? f)
      @f

      ;; not realized: we deem the operation as blocking indefinetely
      (do
        (println :failed)
        (send *agent* some-operation)
        agent-state))))

...the agent still gets stuck, and doesn't even print :failed.

Community
  • 1
  • 1
deprecated
  • 5,142
  • 3
  • 41
  • 62
  • Does the agent or ref have any [validators](http://clojuredocs.org/clojure_core/clojure.core/set-validator!) added to them? – juan.facorro Jun 24 '13 at 12:19
  • Yes, some of the refs these functions operate upon have associated validators. – deprecated Jun 24 '13 at 12:22
  • If the new state for the ref is not validated by any of its validators, then all `send` or `send-off` on agents within the transaction may be discarded. – juan.facorro Jun 24 '13 at 12:25
  • It seems likely that that is being the case, will check it out. In any case, the failure-silence of this behavior is pretty undesirable... – deprecated Jun 24 '13 at 12:29
  • Validation didn't happen to be the source of my problems. Any other ideas? – deprecated Jun 24 '13 at 13:12
  • In general you should do as much outside the `dosync` as possible. `(defn s-o [a-s] (let [u (foo a-s)] (send *agent* s-o) (dosync (alter bar whatever u)) u))` For debugging: you might want to bisect the mentioned changes in the code base to identify the culprit. – kotarak Jun 24 '13 at 14:27
  • How about a minimal case to reproduce the failure conditions? – John Cromartie Jun 24 '13 at 17:16
  • If you `send` inside a transaction but the transaction fails (i.e. due to a validator) then the `send` will be cancelled. – John Cromartie Jun 24 '13 at 17:29

2 Answers2

0

It's worth being aware of the way send and dosync interact. All calls to send in a dosync happen exactly once, and only once the transaction commits This prevents messages being delivered to an agent form a transaction that is later discarded. You could test this by shrinking the scope of the dosync

Arthur Ulfeldt
  • 90,827
  • 27
  • 201
  • 284
0

Send pool is limited so only certain amount of agents can be executed at the same time (see this answer). May this be the case?

Community
  • 1
  • 1
Niki Tonsky
  • 1,327
  • 11
  • 19