2

I'd like to use chunked cons or some other way to create a lazy-seq that blocks. Given a source:

(defn -source- [] (repeatedly (fn [] (future (Thread/sleep 100) [1 2]))))

(take 2 (-source-))
;; => (<future> <future>)

I'd like to have a function called injest where:

(take 3 (injest (-source-)))
=> [;; sleep 100
    1 2 
    ;; sleep 100
    1]

(take 6 (injest (-source-)))
=> [;; sleep 100
    1 2 
    ;; sleep 100
    1 2 
    ;; sleep 100
    1 2]

;; ... etc ...

how would I go about writing this function?

zcaudate
  • 13,998
  • 7
  • 64
  • 124

3 Answers3

2

This source will naturally block as you consume it, so you don't have to do anything terribly fancy. It's almost enough to simply (mapcat deref):

(doseq [x (take 16 (mapcat deref (-source- )))]
  (println {:value x :time (System/currentTimeMillis)}))
{:value 1, :time 1597725323091}
{:value 2, :time 1597725323092}
{:value 1, :time 1597725323092}
{:value 2, :time 1597725323093}
{:value 1, :time 1597725323093}
{:value 2, :time 1597725323093}
{:value 1, :time 1597725323194}
{:value 2, :time 1597725323195}
{:value 1, :time 1597725323299}
{:value 2, :time 1597725323300}
{:value 1, :time 1597725323406}
{:value 2, :time 1597725323406}
{:value 1, :time 1597725323510}
{:value 2, :time 1597725323511}

Notice how the first few items come in all at once, and then after that each pair is staggered by about the time you'd expect? This is due to the well-known(?) fact that apply (and therefore mapcat, which is implemented with apply concat) is more eager than necessary, for performance reasons. If it is important for you to get the right delay even on the first few items, you can simply implement your own version of apply concat that doesn't optimize for short input lists.

(defn ingest [xs]
  (when-let [coll (seq (map (comp seq deref) xs))]
    ((fn step [curr remaining]
       (lazy-seq
         (cond curr (cons (first curr) (step (next curr) remaining))
               remaining (step (first remaining) (next remaining)))))
      (first coll) (next coll))))

A. Webb in the comments suggests an equivalent but much simpler implementation:

(defn ingest [coll]
  (for [batch coll,
        item @batch]
    item))
amalloy
  • 89,153
  • 8
  • 140
  • 205
  • I think the problem is if I wanted to only take 10 items. because mapcat is lazy, it'll block. – zcaudate Aug 18 '20 at 06:22
  • I don't understand your comment at all. How is blocking a problem? It's exactly what you said you wanted. – amalloy Aug 18 '20 at 08:49
  • I just read the initial comment. I was wrong twice: 1) I meant to write `because mapcat isn't lazy`. 2) I tried out `(->> (mapcat deref (-source-)) (take 10))` and it returns 10 values without waiting for the rest so mapcat is actually lazy. Sorry. – zcaudate Aug 18 '20 at 09:01
  • Could you also just write `(take 16 (for [x (map deref (-source-)) y x] y))` for lazy concatenation here? – A. Webb Aug 20 '20 at 03:17
  • `for` does chunking as well. That's no different from `mapcat`. So it works, but doesn't have the laziness properties we were looking for. – amalloy Aug 20 '20 at 04:43
  • I mean to avoid the over eagerness of `mapcat` or `apply concat`. I didn't read chunking as an issue in the problem, but rather the desired outcome. The problem you stated for `mapcat` is it is waiting on the first few derefs before giving them all to us at once. The `for` is lazy enough to give us the first when ready; it has the expected sequence of timestamps--two together, wait, two together, etc. – A. Webb Aug 20 '20 at 05:11
  • Oh yes, I forgot what the issue was. Indeed not chunking. Your for-workaround is much better. I've incorporated it into my answer, with some tweaks to make it fit my style. – amalloy Aug 20 '20 at 06:23
1

I think you're good with just deref'ing the elements of the lazy seq, and just force the consumption of the entries you need, like this:

(defn -source- [] (repeatedly (fn [] (future (Thread/sleep 100) [1 2]))))

(defn injest [src]
  (map deref src))

;; (time (dorun (take 3 (injest (-source-)))))
;; => "Elapsed time: 303.432003 msecs"

;; (time (dorun (take 6 (injest (-source-)))))
;; => "Elapsed time: 603.319103 msecs"

On the other hand, I think that depending on the number of items it might be better to avoid creating lots of futures and use a lazy-seq that depending on the index of the element might block for a while.

Denis Fuenzalida
  • 3,271
  • 1
  • 17
  • 22
  • 1
    I suspect the futures are just an expository device here, a way to make it clear when items are being realized from the input and ensure we don't consume over-eagerly. – amalloy Aug 18 '20 at 08:51
1

You can solve it by iterating a state machine. I don't think this suffers from the optimizations related to apply pointed out by others, but I am not sure if there might be other issues with this approach:

(defn step-state [[current-element-to-unpack input-seq]]
  (cond
    (empty? input-seq) nil
    (empty? current-element-to-unpack) [(deref (first input-seq)) (rest input-seq)]
    :default [(rest current-element-to-unpack) input-seq]))

(defn injest [input-seq]
  (->> [[] input-seq]
       (iterate step-state)
       (take-while some?)
       (map first)
       (filter seq)
       (map first)))
Rulle
  • 4,496
  • 1
  • 15
  • 21
  • 1
    That's pretty neat. I ended up using https://github.com/erdos/erdos.yield but might change to this. – zcaudate Aug 18 '20 at 07:51
  • I think your `step-state` solution is actually made more complicated by trying to do everything with built-in lazy sequence combinators. Building stuff from raw lazy-seqs and recursion often turns out simpler, as in my answer. – amalloy Aug 18 '20 at 08:49
  • @amalloy A wise person once said: "Good design is about pulling things apart". I know how my solution can be pulled apart. Not sure about yours. :-) – Rulle Aug 18 '20 at 09:19