2

working my way up learning Clojure I arrived at the following problem:

setup: A class for a graph data structure, created with deftype and definterface which has a addNode [id data] member function. Works as expected when called directly, like (.addNode graph "anItem" 14)

idea: Since string tokenizing and updating the graph both consume considerable amounts of time (at least for millions of lines), I would like to read and tokenize the file serially and push the token lists to an agent which will execute the `(.addNode graph id data) part.

problem: I can't seem to find the right syntax to make the agent accept a class instance's member function as update function.

Simplified code (dropped namespaces here, may contain typos!):

; from graph.clj
(definterface IGraph
  (addNode [id data])
  (addNode2 [_ id data]))
(deftype Graph [^:volatile-mutable nodes] ; expects an empty map, else further calls fail horribly
  IGraph
  (addNode [this id data] (set! nodes (assoc nodes id data)) this)
  (addNode2 [this _ id data] (.addNode this id data) this))

; core.clj
(def g (Graph. {}))
(def smith (agent g))               ; agent smith shall do the dirty work

(send smith .addNode "x" 42) ; unable to resolve symbol
(send smith (.addNode @smith) "x" 42) ; IllegalArgumentException (arity?)
(send smith (.addNode2 @smith) "x" 42) ; same as above. Not arity after all?
(send smith #(.addNode @smith) "x" 42) ; ArityException of eval (3)
(send smith (partial #(.addNode @smith)) "x" 42) ; the same

; agent smith, the president is ashamed...

The five lines won't work for various reasons, while a simple

(def jones (agent 0))
(send jones + 1)

; agent jones, this nation is in your debt

successfully executes. This should be possible, so what am I doing wrong?

Jaime Agudo
  • 8,076
  • 4
  • 30
  • 35
waechtertroll
  • 607
  • 3
  • 17

2 Answers2

2

Your direct issue is that .addNode isn't a function, but some sugar around the . special form. You can't pass special forms around this way, so you'll need to wrap it in a function that the agent knows how to call - #(.addNode %&) or something similar. The special form is then only evaluated once all the arguments are there, and it can see that there is an addNode method on the graph in its first argument.

Still, James Sharp's answer has a good point - this is a pretty imperative and OO way to treat this problem. From your code so far it looks like you're intending to feed tokens from the list serially into smith with send, who will then update his graph by assoc-ing each into it. This is a classic reduce operation - take the empty graph and assoc into that, take the result of that and assoc into that and so on, until the input runs out. Having an agent perform STM things between each step of this process doesn't seem very necessary.

If you're looking to use ^:volatile-mutable for performance reasons, you could also try using transients and reducing assoc! - or just using (into {} ... which handles the transients for you (although it behaves like conj, not assoc and for maps expects vectors of [key value] rather than separate key and value arguments).

Community
  • 1
  • 1
Magos
  • 3,004
  • 1
  • 20
  • 20
  • Thanks for your answer. I only use that mutable setup because both tasks need about 50% of computation time each and so I aimed for explicit concurrendy. I don't know about using transients for concurrency, but I'll read up. Any suggestions for good references? – waechtertroll Jul 23 '15 at 06:35
1

What you are trying to do could be possible but IMHO not idiomatic, you are thinking in an OO way yet. As the docs says,

Agent should be itself immutable (preferably an instance of one of Clojure's persistent collections)

You can model a tree with a Map, have a look at this example.

Generally speaking 4clojure it's a good place to get started to write idiomatic Clojure solutions

Edited: more complete and idiomatic example

Jaime Agudo
  • 8,076
  • 4
  • 30
  • 35
  • Thank you for your answer. You're right about idiomatic Clojure ways - I tried this setup to explicitly force concurrency. I already worked through the [koans](http://clojurekoans.com), but thanks for the 4clojure link. Internally, the graph consists of maps, although I fear those maps waste a lot of memory for unused buckets, which might provide fatal for the task... – waechtertroll Jul 23 '15 at 06:41
  • I read somewhere that Clojure inmutable maps take roughly double space as the Java equivalents but that's a pros-cons analysis you have to do in your particular scenario. For concurrency speedup I'd split the file into chunks and will fork n-threads to do the job and in the very last step they'll rely on the `agent` to update the tree. Alternatively you can build separate trees and them combine them all in a _map-reduce_ style. Have a look at [`core.async'](https://github.com/clojure/core.async) if you wanna play further :) – Jaime Agudo Jul 23 '15 at 11:35
  • Well... atm the task is extracting nodes from 60-120GB input text files, so I fear in the end the memory consumption may be the application's bottleneck. Also, any line of text may contribute to any subgraph may start a new graph or may just increase the counter for an already existing edge, so splitting is quite difficult, as it will in worst case scenarios multiply memory consumption. Still, your have proven quite helpful this far... My code became quite fp style, and it expresses 600 lines of c++ code in roughly 80 lines (I'm convert a tool to Clojure for exercise...) – waechtertroll Jul 24 '15 at 13:13
  • Woah... plus missing words and letters - bloody bad keyboard here – waechtertroll Jul 24 '15 at 13:14
  • 120GB is a size where it worth to consider an in-memory graph database as [OrientDB](http://orientdb.com/orientdb/), you even have a clojure [wrapper](https://github.com/eduardoejp/clj-orient). From my experience it works nicely – Jaime Agudo Jul 24 '15 at 17:17