Clojure (Enlive) How to use html/but (negation)

Question

Hi All I'm trying to "parse/extract" html-data with Clojure en Enlive (any better choices ?)

I am trying to get all the ul > li tags that are *NOT part of the <nav> tag I think I should use the (html/but) function from Enlive but can't seem to make it work ?

;;test-envlive.clj

(defn get-tags [dom tag-list]
  (let [tags
         (mapv
          #(vec (html/select dom %1))
          tag-list)]
    tags))

;;Gives NO tags
(get-tags test-dom [[[(html/but :nav) :ul :> :li]]])

;;Gives ALL the LI-tags
(get-tags test-dom [[:ul :> :li]])

<!-- test.html -->
<html>
<head><title>Test page</title>  </head>
<body>
    <div>
        <nav>
            <ul>
                <li>
                    skip these navs-li
                </li>
                
            </ul>
        </nav>
        <h1>Hello World<h1>                 
        <ul><li>get only these li's</li>                
        </ul>           
    </div>  
</body></html>

score 1 · Accepted Answer · answered Aug 05 '22 at 07:15

1

If you had a valid xhtml, you could use XPath from sigel:

(require '[sigel.xpath.core :as xpath])
(let [data "<html><head><title>Test page</title></head>
                <body><div><nav><ul><li>skip these navs-li</li></ul></nav>
                <h1>Hello World</h1>
                <ul><li>get only these li's</li></ul>
                </div></body></html>"]
        (xpath/select data "//li[not(ancestor::nav)]"))

answered Aug 05 '22 at 07:15

akond

15,865
4
35
55

This seems to be the shortest-and-clear solution. Thank you. – user914584 Aug 07 '22 at 08:12

Alan Thompson · Answer 2 · 2022-08-04T17:45:55.520

You could do this with the Tupelo Forest library. Watch the video and see the examples in the unit tests.

Here is one way to solve your problem:

(ns tst.tupelo.forest-examples
  (:use tupelo.core tupelo.forest tupelo.test)
  (:require. ... ))

<snip>

(verify
  (let [html-data "<html>
                      <head><title>Test page</title>  </head>
                      <body>
                          <div>
                              <nav>
                                  <ul>
                                      <li>
                                          skip these navs-li
                                      </li>

                                  </ul>
                              </nav>
                              <h1>Hello World<h1>
                              <ul><li>get only these li's</li>
                              </ul>
                          </div>
                      </body>
                  </html> "]

and the interesting part comes next.

    (hid-count-reset)
    (with-forest (new-forest)
      (let [root-hid   (add-tree-html html-data)
            out-hiccup (hid->hiccup root-hid)
            result-1   (find-paths root-hid [:html :body :div :ul :li])
            li-hid     (last (only result-1))
            li-hiccup  (hid->hiccup li-hid)]
        (is= out-hiccup [:html
                         [:head [:title "Test page"]]
                         [:body
                          [:div
                           [:nav
                            [:ul
                             [:li
                              "\n                                          skip these navs-li\n                                      "]]]
                           [:h1 "Hello World"]
                           [:ul [:li "get only these li's"]]]]])
        (is= result-1 [[1011 1010 1009 1008 1007]])
        (is= li-hid 1007)
        (is= li-hiccup [:li "get only these li's"])))))

The above code can be seen live in the examples.

score 0 · Answer 3 · answered Aug 04 '22 at 17:41

I was able to select target li with Hickory, so if you don't mind changing your library:

Dependency: [hickory "0.7.1"]

Require: [hickory.core :as h] [hickory.select :as s]

(s/select (s/and
            (s/descendant (s/tag :ul)
                          (s/tag :li))
            (s/not (s/descendant (s/tag :nav)
                                 (s/tag :li))))
          (h/as-hickory (h/parse (slurp "resources/site.html"))))

=> [{:type :element, :attrs nil, :tag :li, :content ["get only these li's"]}]

Clojure (Enlive) How to use html/but (negation)

3 Answers3