0

Was looking at the exercises at the bottom of chapter 9 of clojure for the brave and true (in particular the last one of searching multiple engines and returning the first hit of each)

I mocked the actual search with slurp part to be this:

(defn search-for
  [query engine]
  (Thread/sleep 2000)
  (format "https://www.%s.com/search?q%%3D%s", engine query))

And implemented the behavior like this:

(defn get-first-hit-from-each
  [query engines]
  (let [futs (map (fn [engine]
                    (future (search-for query engine))) engines)]
    (doall futs)
    (map deref futs)))

(I know the return here is a list and the exercise asks for a vector but can just do an into for that...)

but when i run this in the REPL

(time (get-first-hit-from-each "gray+cat" '("google" "bing")))

it seems to take 2 seconds after I added the doall (since map returns a lazy seq i don't think any of the futures even start unless i consume the seq, (last futs) also seems to work) but when I use the time macro in the REPL it reports almost no time consumed even if it is taking 2 seconds:

(time (get-first-hit-from-each "gray+cat" '("google" "bing")))
"Elapsed time: 0.189609 msecs"
("https://www.google.com/search?q%3Dgray+cat" "https://www.bing.com/search?q%3Dgray+cat")

what is going on with the time macro here?

Palace Chan
  • 8,845
  • 11
  • 41
  • 93

2 Answers2

5

TL;DR: Lazy seqs don't play nice with the time macro, and your function get-first-hit-from-each returns a lazy seq. To make lazy seqs work with time, wrap them in a doall, as suggested by the documentation. See below for a fuller thought process:

The following is the definition of the time macro in clojure.core (source):

(defmacro time
  "Evaluates expr and prints the time it took.  Returns the value of
 expr."
  {:added "1.0"}
  [expr]
  `(let [start# (. System (nanoTime))
         ret# ~expr]
     (prn (str "Elapsed time: " (/ (double (- (. System (nanoTime)) start#)) 1000000.0) " msecs"))
     ret#))

Notice how the macro saves the return value of expr in ret#, right after which it prints the elapsed time? Only after that does ret# get returned. The key here is that your function get-first-hit-from-each returns a lazy sequence (since map returns a lazy sequence):

(type (get-first-hit-from-each "gray+cat" '("google" "bing")))
;; => clojure.lang.LazySeq

As such, when you do (time (get-first-hit-from-each "gray+cat" '("google" "bing"))), what gets saved in ret# is a lazy sequence, which doesn't actually get evaluated until we try to use its value...

We can check whether a lazy sequence has been evaluated using the realized? function. So let's tweak the time macro by adding a line to check whether ret# has been evaluated, right after the elapsed time is printed:

(defmacro my-time
  [expr]
  `(let [start# (. System (nanoTime))
         ret# ~expr]
     (prn (str "Elapsed time: " (/ (double (- (. System (nanoTime)) start#)) 1000000.0) " msecs"))
     (prn (realized? ret#)) ;; has the lazy sequence been evaluated?
     ret#))

Now testing it:

(my-time (get-first-hit-from-each "gray+cat" '("google" "bing")))
"Elapsed time: 0.223054 msecs"
false
;; => ("https://www.google.com/search?q%3Dgray+cat" "https://www.bing.com/search?q%3Dgray+cat")

Nope... so that's why time prints inaccurately. None of the computationally-long stuff actually gets to run before the printout is made.

To fix that and get an accurate time, we need to ensure evaluation of the lazy seq, which can be done by strategically placing a doall in a bunch of possible places, either within your function, wrapping the map:

(defn get-first-hit-from-each
  [query engines]
  (let [futs (map (fn [engine]
                    (future (search-for query engine))) engines)]
    (doall futs)
    (doall (map deref futs))))
;; => #'propeller.core/get-first-hit-from-each

(time (get-first-hit-from-each "gray+cat" '("google" "bing")))
"Elapsed time: 2005.478689 msecs"
;; => ("https://www.google.com/search?q%3Dgray+cat" "https://www.bing.com/search?q%3Dgray+cat")

or within time, wrapping the function call:

(time (doall (get-first-hit-from-each "gray+cat" '("google" "bing"))))
Count Orlok
  • 997
  • 4
  • 13
-1

Something weird is happening in your setup. It works as expected for me, taking 4 seconds:

(ns tst.demo.core
  (:use tupelo.core tupelo.test))

(defn search-for
      [query engine]
      (Thread/sleep 2000)
      (format "https://www.%s.com/search?q%%3D%s", engine query))

(defn get-first-hit-from-each
      [query engines]
      (let [futs (map (fn [engine]
                        (future (search-for query engine)))
                      engines)]
        ; (doall futs)
        (mapv #(println (deref %)) futs)))

(dotest
  (time
    (get-first-hit-from-each "gray+cat" '("google" "bing"))))

with result

--------------------------------------
   Clojure 1.10.2-alpha1    Java 14
--------------------------------------

Testing tst.demo.core
https://www.google.com/search?q%3Dgray+cat
https://www.bing.com/search?q%3Dgray+cat
"Elapsed time: 4001.384795 msecs"

I did not even use the doall.


Update:

I found my mistake. I had accidentally used mapv instead of map on line 15. This forces it to wait for each call to deref. If you use map here, you get a lazy seq of a lazy seq and the function ends before the timer expires (twice => 4 sec).

--------------------------------------
   Clojure 1.10.2-alpha1    Java 14
--------------------------------------

Testing tst.demo.core
"Elapsed time: 0.182797 msecs"

I always recommend to use mapv instead of map. There is also a filterv available. When in doubt, force the output into a nice, eager vector using (vec ...) to rid yourself of the headaches due to laziness.

Maybe one time in a hundred you will need the features a lazy seq provides. The other times it is a problem since you cannot predict the order of execution of statements.

P.S.

See this list of documentation, including the fabulous Clojure CheatSheet.

P.P.S.

The OP is correct that, ideally, each query should have run in parallel in a separate thread (each future uses a separate thread). The problem is, again, due to lazy map behavior.

On the println at the end, each element in the lazy list from futs is not evaluated until required by the println. So, they don't even start until later, in sequence. This defeats the intended goal of parallel execution. Again, lazy-seq behavior is the cause.

The cure: make everything explicit & eager 99+ percent of the time (i.e. mapv):

(defn search-for
      [query engine]
      (Thread/sleep 2000)
      (format "https://www.%s.com/search?q%%3D%s", engine query))

(defn get-first-hit-from-each
      [query engines]
      (let [futs (mapv (fn [engine]
                         (future (search-for query engine)))
                       engines)]
        (mapv #(println (deref %)) futs)))

with result:

Testing tst.demo.core
https://www.google.com/search?q%3Dgray+cat
https://www.bing.com/search?q%3Dgray+cat
"Elapsed time: 2003.770331 msecs"
Alan Thompson
  • 29,276
  • 6
  • 41
  • 48
  • 2
    I suggest editing your answer to include just the correct part. If you insist, you can keep your previous wrong answer as an addendum or something, but I wouldn't. Stack Overflow is a repository of curated questions and answers, not a diary: just because you were once wrong doesn't mean it needs to be shown so prominently. – amalloy Jul 06 '20 at 02:32
  • Why vec instead of doall to mitigate the lazy part? Also with mapv instead of map shouldn’t 2 seconds still be the result if you allow all futures to start dispatching via doall before the mapv? – Palace Chan Jul 06 '20 at 03:49
  • @PalaceChan: Each call to `search-for` will pause for 2 seconds. @AMalloy I considered making the same edit (i.e. erasing/hiding my mistake), but thought it might be more informative to show how easily anyone can make mistakes about the order-of-execution when lazy evaluation comes into play. This is especially important when I/O is involved, such as DB reads or file/print output. – Alan Thompson Jul 06 '20 at 06:17