0

What is the appropriate json way to save and reload enlive's html-resource outputs.

The following procedure does not preserve the data structure (note that I ask json/read-str to map keys to symbols):

(require net.cgrand.enlive-html :as html)
(require clojure.data.json :as json)


(def craig-home
  (html/html-resource (java.net.URL. "http://www.craigslist.org/about/sites")))
(spit "./data/test_json_flow.json" (json/write-str  craig-home))

(def craig-reloaded
  (json/read-str (slurp "./data/test_json_flow.json") :key-fn keyword))

(defn count-nodes [page] (count (html/select page [:div.box :h4])))
(println (count-nodes craig-home)) ;; => 140
(println (count-nodes craig-reloaded)) ;; => 0

Thanks.

UPDATE

To address Mark Fischer's comment I post a different code that address html/select instead of html/html-resource

(def craig-home
  (html/html-resource (java.net.URL. "http://www.craigslist.org/about/sites")))
(def craig-boxes (html/select craig-home [:div.box]))
(count (html/select craig-boxes [:h4])) ;; => 140

(spit "./data/test_json_flow.json" (json/write-str  craig-boxes))
(def craig-boxes-reloaded
  (json/read-str (slurp "./data/test_json_flow.json") :key-fn keyword))
(count (html/select craig-boxes-reloaded [:h4])) ;; => 0
user3639782
  • 487
  • 3
  • 10
  • 1
    This doesn't seem right, your craig-reloaded and craig-home are returning different types, the reloaded version isn't returning html that the count-nodes method can use, is it? – Mark Fisher Jan 23 '15 at 12:43
  • @mark you are right. I messed up with my examples. Actually my question was about jsonizing `html/select` results and parse them further. I update the example – user3639782 Jan 23 '15 at 13:13

1 Answers1

2

A simpler approach would be to write/read using Clojure edn:

(require '[net.cgrand.enlive-html :as html])
(require '[clojure.data.json :as json])

(def craig-home (html/html-resource (java.net.URL. "http://www.craigslist.org/about/sites")))

(spit "./data/test_json_flow.json" (pr-str craig-home))

(def craig-reloaded
  (clojure.edn/read-string (slurp "./data/test_json_flow.json")))

(defn count-nodes [page] (count (html/select page [:div.box :h4])))
(println (count-nodes craig-home)) ;=>140
(println (count-nodes craig-reloaded)) ;=>140

Enlive expects the tag name value also to be a keyword and will not find a node if the tag name value is a string (which is what json/write-str and json/read-str converts keywords to).

(json/write-str '({:tag :h4, :attrs nil, :content ("Illinois")}))
;=> "[{\"tag\":\"h4,\",\"attrs\":null,\"content\":[\"Illinois\"]}]"

(json/read-str (json/write-str '({:tag :h4, :attrs nil, :content ("Illinois")})) :key-fn keyword)
;=> [{:tag "h4", :attrs nil, :content ["Illinois"]}]

(pr-str '({:tag :h4 :attrs nil :content ("Illinois")}))
;=> "({:tag :h4, :attrs nil, :content (\"Illinois\")})"

(clojure.edn/read-string (pr-str '({:tag :h4, :attrs nil, :content ("Illinois")})))
;=> ({:tag :h4, :attrs nil, :content ("Illinois")})

If you must use json then you can use the following to convert the :tag values to keywords:

(clojure.walk/postwalk #(if-let [v (and (map? %) (:tag %))]
                          (assoc % :tag (keyword v)) %)
                       craig-reloaded)
Symfrog
  • 3,398
  • 1
  • 17
  • 13
  • Thanks for the edn suggestion. It solves my problem in the clojure word. Now suppose I want to export the results and parse it further in python. Is there a way to produce and export in a 'universal' format? – user3639782 Jan 23 '15 at 13:19
  • I have updated my answer to include a way to read json while converting tag values to keywords before flowing through Enlive. – Symfrog Jan 23 '15 at 13:21