1

I am trying to scrape a website using clojure's enlive library. The corresponding CSS selector is:

body > table:nth-child(2) > tbody > tr > td:nth-child(3) > table > tbody > tr > td > table > tbody > tr:nth-child(n+3)

I have tested the above selector using jquery, and it works. But I don't know how to translate the above to enlive's selector syntax. I have tried to write something along the lines of:

(ns vimindex.core
  (:gen-class)
  (:require [net.cgrand.enlive-html :as html]))

(def ^:dynamic *vim-org-url* "http://www.vim.org/scripts/script_search_results.php?order_by=creation_date&direction=descending")
(defn fetch-url [url]
  (html/html-resource (java.net.URL. url)))

(defn scrape-vimorg []
  (println "Scraping vimorg")
  (println
    (html/select (fetch-url *vim-org-url*)
                 [:body :> [:table (html/nth-child 2)] :> :tbody :> :tr :> [:td (html/nth-child 3)] :> :table :> :tbody :> :tr :> :td :> :table :> :tbody :> [:tr (html/nth-child 1 3)]])))
;                  body  >   table:nth-child(2)         >  tbody  >  tr  >   td:nth-child(3)         >  table  >  tbody  >  tr  >  td  >  table  >  tbody  >   tr:nth-child(n + 3)
; Above selector works with jquery

(defn -main
  [& args]
  (scrape-vimorg))

But I get an empty response. Could you please tell me how to translate the above CSS selector in enlive's syntax.

Thanks a lot.

Edited: To include the full code.

Rohit
  • 127
  • 1
  • 10

2 Answers2

0

The syntax you are missing is an additional set of brackets around elements that use pseudo-selectors. So you want something like this:

 [:body :> [:table (html/nth-child 2)] :> :tbody :> :tr 
 [:td (html/nth-child 3)] :> :table :> :tbody :> :tr :> :td :> 
 :table :tbody :> [:tr (html/nth-child 1 3)]])
jmargolisvt
  • 5,722
  • 4
  • 29
  • 46
  • thanks a lot for the quick response. I made the changes as suggested by you, but it still didn't work. I still get an empty response. I have edited my question to include the full code. – Rohit Jan 08 '16 at 23:55
  • There are lots of ways to get lost here. I would suggest working with a simpler selection first. A triple-nested table is kind of a rough way to learn Enlive. Extending the selector one at a time is a good way to find where it breaks. Also, the code you provided isn't very useful. How about posting the Enlive nodes? – jmargolisvt Jan 09 '16 at 00:01
0

It looks like browsers (at least my version of firefox) add a tbody tag in their DOM representation even if it's not in the actual source.

Enlive does not do so. So your code should work when you omit the tbody parts.

Anton Harald
  • 5,772
  • 4
  • 27
  • 61