4

Clojure beginner here, not sure if the terminology in the question is even correct.

I'm building a web scraper using the clj-webdriver taxi api. There are several sites that it needs to scrape data from. The following isn't actually code from the project, but I've tested it and verified that it illustrates my question:

(def gh-un "my-username")
(def gh-pw "my-password")

;; print the first five "starred" alerts from my github feed
(defn get-info [url]
  (to url)
  (click "a[href='/login']")
  (input-text "input#login_field" gh-un)
  (input-text "input#password" gh-pw)
  (click "input.btn")
  (pprint (map text (take 5 (find-elements {:css "div.alert.watch_started"}))))
  (click "img.avatar")
  (click "button.dropdown-signout"))

(defn github-wrapper []
  (map get-info (repeat 3 "http://www.github.com"))
  (quit))

If I call (github-wrapper) as is, the browser window will close almost immediately, because of the (quit) call. Wrapping the map call with doall, i.e. (doall (map get-info (repeat 3 "http://www.github.com"))), solves this problem, which suggests that the problem is that map produces a lazy sequence that's not getting consumed, and therefore I'm not seeing the side-effects of the calls to get-info.

However, if I remove the (quit) call at the end of get-info, github-wrapper does what I want it to.

My question is, why does the lazy sequence get consumed in the latter case, but not in the former?

Y T
  • 63
  • 5
  • 6
    Probably, it's because you're running your code in REPL, so your lazy sequence is returned from `github-wrapper` and then evaluated and printed by the REPL. – Leonid Beschastny Jan 23 '16 at 18:14
  • I was indeed running the code in the REPL. I don't have this script set up to run from the command line yet, but I will be sure to test this once I do. Thank you! – Y T Jan 24 '16 at 23:46

1 Answers1

1

It is because you are probably printing the returned map when you call github-wrapper. Printing a lazy sequence is one way (along with doall) to consume it. When you put quit at the end of github-wrapper, that is what gets returned and the map just assumes that nothing requires its values.

You can also use mapv instead of map if you want your map to be immediately realized.

rabidpraxis
  • 556
  • 3
  • 10