2

In the following link https://github.com/swannodette/enlive-tutorial/blob/master/src/tutorial/scrape1.clj

it shows how to parse the page from a URL, but I need to use a sock5 proxy, and I can't figure out how to use proxy inside enlive, but I know how to use proxy in httpclient, but how to parse the result from httpclient, I have the following code, but the last line show empty result

    (:require [clojure.set :as set]
                [clj-http.client :as client]
                [clj-http.conn-mgr :as conn-mgr]
                [clj-time.core :as time]
                [jsoup.soup :as soup]
                [clj-time.coerce :as tc]
                [net.cgrand.enlive-html :as html]
                )     
     (def a (client/get "https://news.ycombinator.com/"
                             {:connection-manager (conn-mgr/make-socks-proxied-conn-manager "127.0.0.1" 9150)
                              :socket-timeout 10000 :conn-timeout 10000
                              :client-params {"http.useragent" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.20 (KHTML, like Gecko) Chrome/11.0.672.2 Safari/534.20"}}))
(def b (html/html-resource a))
(html/select b [:td.title :a])
Daniel Wu
  • 5,853
  • 12
  • 42
  • 93

1 Answers1

2

When using enlive the html-resource fn performs a fetch from a URL and then converts it to a data structure it can parse. It seems that when you pass it an already fulfilled request, it just returns back the request instead of throwing an error.

Either way, the function you want is html-snippet and you will want to pass it the body of your request. Like so:

;; Does not matter if you are using a connection manager or not as long as
;; its returning a result with a body
(def req (client/get "https://news.ycombinator.com/"))

(def body (:body req))
(def nodes (html/html-snippet body))
(html/select nodes [:td.title :a])

;; Or you can put it all together like this

(-> req
    :body 
    html/html-snippet
    (html/select [:td.title :a])))
rabidpraxis
  • 556
  • 3
  • 10