3

I'm finding the usage of xml-> extremely confusing. I've read the docs and the examples but can't figure out how to get the nested nodes of an xml doc.

Assume the following xml is in a zipper (as from xml-zip):

<html>
 <body>
  <div class='one'>
    <div class='two'></div>
  </div>
 </body>
</html>

I am trying to return the div with class='two'.

I was expecting this to work:

(xml-> z :html :body :div :div)

Or this:

(xml-> z :html :body :div (attr= :class "two"))

Kind of like css selectors.

But it returns only the first level, and it doesn't search down through the tree.

The only way I can make it work is:

(xml-> z :html :body :div children leftmost?)

Is that what I'm supposed to do?

The whole reason I started using xml-> was for convenience and to avoid navigating the zipper up and down and left and right. If xml-> can not get nested nodes then I don't see the value over clojure.zip.

Thanks.

Scott Klarenbach
  • 37,171
  • 15
  • 62
  • 91

2 Answers2

2

Two consequitive :div match the same node. You should have come down. And I believe you've forgotten to get the node with zip/node.

(ns reagenttest.sample
    (:require 
              [clojure.zip :as zip]
              [clojure.data.zip.xml :as data-zip]))
(let [s "..."
      doc (xml/parse (java.io.ByteArrayInputStream. (.getBytes s)))]
(prn (data-zip/xml-> (zip/xml-zip doc) :html :body :div zip/down (data-zip/attr= :class "two") zip/node)))

or you could use custom-made abstraction if you are not happy with xml->:

(defn xml->find [loc & path]
    (let [new-path (conj (vec (butlast (interleave path (repeat zip/down)))) zip/node)]
        (apply (partial data-zip/xml-> loc) new-path)))

Now you can do this:

(xml->find z :html :body :div :div)
(xml->find z :html :body :div (data-zip/attr= :class "two"))
akond
  • 15,865
  • 4
  • 35
  • 55
  • 1
    The problem I have with zip/down is the same problem I'd have with using children :div. It's a leaky abstraction. I want to be able to express the semantics of the tags ie, (xml-> z :html :body :div[2] text) or something similar. Navigating the zipper SOMETIMES and using the tags sometimes is confusing, and doesn't read well. Same as I wouldn't want to use (xml-> :html children leftmost?) in place of (xml-> :html :body) – Scott Klarenbach Jun 20 '17 at 21:18
  • 2
    I previously had code using xml-seq just filtering and taking :content tags, and I thought xml-> would be much simpler. But it's actually more complicated and with this inconsistency I think I'll just go back to xml-seq, since that's easier to grok. – Scott Klarenbach Jun 20 '17 at 21:30
0

You can solve this problem using tupelo.forest from the Tupelo library. The forest contains functions for searching and manipulating trees of data. It is like Enlive on steroids. Here is a solution for your data:

(dotest
  (with-forest (new-forest)
    (let [xml-str         "<html>
                             <body>
                               <div class='one'>
                                 <div class='two'></div>
                               </div>
                             </body>
                           </html>"

          enlive-tree     (->> xml-str
                            java.io.StringReader.
                            en-html/xml-resource
                            only)
          root-hid        (add-tree-enlive enlive-tree)

          ; Removing whitespace nodes is optional; just done to keep things neat
          blank-leaf-hid? (fn [hid] (ts/whitespace? (hid->value hid))) ; whitespace pred fn
          blank-leaf-hids (keep-if blank-leaf-hid? (all-leaf-hids)) ; find whitespace nodes
          >>              (apply remove-hid blank-leaf-hids) ; delete whitespace nodes found

          ; Can search for inner `div` 2 ways
          result-1        (find-paths root-hid [:html :body :div :div]) ; explicit path from root
          result-2        (find-paths root-hid [:** {:class "two"}]) ; wildcard path that ends in :class "two"
    ]
       (is= result-1 result-2) ; both searches return the same path
       (is= (hid->bush root-hid)
         [{:tag :html}
          [{:tag :body}
           [{:class "one", :tag :div}
            [{:class "two", :tag :div}]]]])
      (is=
        (format-paths result-1)
        (format-paths result-2)
        [[{:tag :html}
          [{:tag :body}
           [{:class "one", :tag :div}
            [{:class "two", :tag :div}]]]]])

       (is (val= (hid->elem (last (only result-1)))
             {:attrs {:class "two", :tag :div}, :kids []})))))

There are many examples in the unit tests and in the forest-examples demo file.

Alan Thompson
  • 29,276
  • 6
  • 41
  • 48