0

I'm trying to get started with Onyx, the distributed computing platform in Clojure. In particular, I try to understand how to aggregate data. If I understand the documentation correctly, a combination of a window and a :trigger/emit function should allow me to do this.

So, I modified the aggregation example (Onyx 0.13.0) in three ways (cf. gist with complete code):

  • in -main I println any segments put on the output channel; this works as expected with the original code in that it picks up all segments and prints them to stdout.
  • I add an emit function like this:

    (defn make-ds
       [event window trigger {:keys [lower-bound upper-bound event-type] :as state-event} extent-state]
       (println "make-ds called")
       {:ds window})
    
  • I add a trigger configuration (original dump-words trigger emitted for brevity):

    (def triggers
     [{:trigger/window-id :word-counter
       :trigger/id :make-ds
       :trigger/on :onyx.triggers/segment
       :trigger/fire-all-extents? true
       :trigger/threshold [5 :elements]
       :trigger/emit ::make-ds}])
    
  • I change the :count-words task to from calling the identity function to the reduce type, so that it doesn't hand over all input segments to the output (and added config options that onyx should tackle this as a batch):

        {:onyx/name :count-words
         ;:onyx/fn :clojure.core/identity
         :onyx/type :reduce ; :function
         :onyx/group-by-key :word
         :onyx/flux-policy :kill
         :onyx/min-peers 1
         :onyx/max-peers 1
         :onyx/batch-size 1000
         :onyx/batch-fn? true}  
    

When I run this now, I can see in the output that the emit function (i.e. make-ds) gets called for each input segment (first output coming from the dump-words trigger of the original code):

     > lein run
     [....]
     Om -> 1
     name -> 1
     My -> 2
     a -> 1
     gone -> 1
     Coffee -> 1
     to -> 1
     get -> 1
     Time -> 1
     make-ds called
     make-ds called
     make-ds called
     make-ds called
     [....]

However, the segment build from make-ds doesn't make it through to the output-channel, they are never being printed. If I revert the :count-words task to the identity function, this works just fine. Also, it looks as if the emit function is called for each input segment, whereas I would expect it to be called only when the threshold condition is true (i.e. whenever 5 elements have been aggregated in the window).

As the test for this functionality within the Onyx code base (onyx.windowing.emit-aggregate-test) is passing just fine, I guess I'm making a stupid mistake somewhere, but I'm at a loss figuring out what.

schaueho
  • 3,419
  • 1
  • 21
  • 32

1 Answers1

0

I finally saw that there was a warning in the log file onxy.log like this:

[clojure.lang.ExceptionInfo: Windows cannot be checkpointed with ZooKeeper unless 
  :onyx.peer/storage.zk.insanely-allow-windowing? is set to true in the peer config.
  This should only be turned on as a development convenience.
[clojure.lang.ExceptionInfo: Handling uncaught exception thrown inside task 
  lifecycle :lifecycle/checkpoint-state. Killing the job. -> Exception type: 
  clojure.lang.ExceptionInfo. Exception message: Windows cannot be checkpointed with
  ZooKeeper unless :onyx.peer/storage.zk.insanely-allow-windowing? is set to true   in
  the peer config. This should only be turned on as a development convenience.  

As soon as I set this, I finally got some segments handed over to the next task. I.e., I had to change the peer config to:

(def peer-config
  {:zookeeper/address "127.0.0.1:2189"
   :onyx/tenancy-id id
   :onyx.peer/job-scheduler :onyx.job-scheduler/balanced
   :onyx.peer/storage.zk.insanely-allow-windowing? true
   :onyx.messaging/impl :aeron
   :onyx.messaging/peer-port 40200
   :onyx.messaging/bind-addr "localhost"})

Now, :onyx.peer/storage.zk.insanely-allow-windowing? doesn't sound like a good thing to do. Lucas Bradstreet recommended on the Clojurians Slack channel switching to S3 checkpointing.

schaueho
  • 3,419
  • 1
  • 21
  • 32