6

A stateful transducer is expected to flush its state by calling the "step" arity (i.e. [([result input] ) ) as many times as needed, before calling the "complete" arity (i.e. ([result] )).

My question is how early termination (with reduced) is to be handled here.

On https://clojure.org/reference/transducers#_early_termination it says

In the completion step, a transducer with reduction state should flush state prior to calling the nested transformer’s completion function, unless it has previously seen a reduced value from the nested step in which case pending state should be discarded.

but what does "has previously seen" mean?

There are three possible interpretations of the above quote: If the "step" arity of the nested transformer has returned a reduced value, the "complete" step should

  1. still flush state by calling the nested "step" arity,
  2. ignore its state and just call the "complete" arity, or
  3. not call any arity of the nested transformer at all

Looking at the implementation of partition-by it seems that the first option applies:

   ([result]
           (let [result (if (.isEmpty a)
                          result
                          (let [v (vec (.toArray a))]
                            ;;clear first!
                            (.clear a)
                            (unreduced (rf result v))))]
             (rf result)))

This also removes the "reducedness" from the result of calling the "step" arity of the nested transform with the remaining state, before passing this unreduced value to the "complete" arity of the nested transform.

The question is then, if I have a stateful reducer that in its completion step would call the "step" arity of its nested transform more than once, how should it react if it sees a reduced value in its completion step?

Trying to combine the above quote and code, I think that it is supposed to stop calling the "step" arity and directly call the "complete" arity with the unreduced value like partition-by.

But then, looking at the code for cat:

(defn ^:private preserving-reduced
  [rf]
  #(let [ret (rf %1 %2)]
     (if (reduced? ret)
       (reduced ret)
       ret)))

(defn cat
  "A transducer which concatenates the contents of each input, which must be a
  collection, into the reduction."
  {:added "1.7"}
  [rf]
  (let [rrf (preserving-reduced rf)]  
    (fn
      ([] (rf))
      ([result] (rf result))
      ([result input]
         (reduce rrf result input)))))

it wraps any reduced results in another "layer" of reducedness in the "step" arity. (To later be removed by reduce?)

Is it just the fact that I happened to find two special cases in the implementations of partition-by and cat, or is there some general rule about when to call the nested "step" function during early termination, and how "reducedness" is to be conveyed (or not)?

As far as I understand it, a transducer should only care about its own operation and not how it is used or composed, but how does this affect early termination?

Also, are the "step" and "complete" arities to be treated as completely separate where early termination in the former does not affect the latter?

Looking at the implementation of take-while it seems that they indeed are separate:

(fn
         ([] (rf))
         ([result] (rf result))
         ([result input]
            (if (pred input)
              (rf result input)
              (reduced result))))))

In the baggage handler example from the StrangeLoop talk1, the completion arity of (taking-while non-ticking?) would still put any remaining bags from a previous steps onto the plane, regardless of whether it had returned reduced or not in the "step" arity.

Is that a corrrect interpretation? And if so, is the early termination to be handled by the reducing process, so that the "completion" arity is never called, if a reduced value is encountered?

Update:

Rich Hickey talks a bit about this in his "Inside Transducers" talk 2, where he says

If you're accumulating, as soon as the function under you -- the function you're transforming -- has told you it's terminated early you shouldn't accumulate any more. You should put yourself in a state so that when you're asked to complete you say "I don't have anything to flush", 'cause you know the function underneath you does not want to see it.

This is the crux of what I am getting at: "stop accumulating" does not mean "throw away all accumulated state", but looking at the "step" arity from partition-by it just both stops accumulating and clears its accumulated state:

([result input]
           (let [pval @pv
                 val (f input)]
             (vreset! pv val)
             (if (or (identical? pval ::none)
                     (= val pval))
               (do
                 (.add a input)
                 result)
               (let [v (vec (.toArray a))]
                 (.clear a)
                 (let [ret (rf result v)]
                   (when-not (reduced? ret)
                     (.add a input))
                   ret)))))))))

so it will not call (rf result v) during completion and does follow "You should put yourself in a state so that when you're asked to complete you say 'I don't have anything to flush'".

It still calls the "complete" arity ((rf result)) of its nested transform.

So: should a transducer, when seeing a reduced value

  • in the "step" arity, clear its accumlated state and return the reduced value it just got, and
  • in the "complete" arity just pass on whatever value it gets without any transformation (rf result)?

referenced talks:

[1] Transducers, https://youtu.be/6mTbuzafcII at 30:40

[2] Inside Transducers, https://youtu.be/4KqUvG8HPYo at 24:10

drRobertz
  • 3,490
  • 1
  • 12
  • 23

0 Answers0