1

I need to detect strictly adjacent elements with jsoup. For this I would use the example provided in How to detect strictly adjacent siblings but I need a working example for Jsoup - java.

Input

<div id="container">
    <span class="highlighted">Paragraph 1</span>
    <span class="highlighted">Paragraph 2</span>
    This is just loose text.
    <p class="highlighted">Paragraph 3</p>
</div>

What I'm trying to accomplish is to build a single element with the text of all sibling similar elements.

private String removeSimilarTags(String htmlContent){
        org.jsoup.nodes.Document doc = Jsoup.parse(htmlContent);

        Elements highlightedSpanElements = doc.select("span.highlighted"); //Selecting all spans with class highlight
        for(Element span : highlightedSpanElements){
            Element beforeEl = span.previousElementSibling();
            if(span != null) //I need another function to verify if element has been already removed{
                beforeEl.after("<span class='"+HIGHLIGHT+"'>"+mergeAdjacentSpans(span)+"</span>");
            }
        }
        return doc.outerHtml();
    }

 private String mergeAdjacentSpans(Element span){
        Element nextEl = span.nextElementSibling() != null ? span.nextElementSibling() : null;
       
        String text = span.text();
        if(nextEl != null && nextEl.tagName().equalsIgnoreCase(SPAN_TAG)
                          && nextEl.classNames().contains(HIGHLIGHT)){
            //Next Element is also  a highlighted span
           text =  text.concat(" "+ mergeAdjacentSpans(spanEl));
        }
        span.remove();
        return text;
    }

And also I would like to have some insights of how to verify an element has been already removed. I cannot find a clear answer online.

Thank you guys !

1 Answers1

2

So for detecting if elements are strictly adjacent you should know the difference between Node and Element in Jsoup https://stackoverflow.com/questions/47881838/difference-between-jsoup-element-and-jsoup-node#:~:text=A%20node%20is%20the%20generic,Node . In my case I used Node because it contains whatever elements comes after being a string or an actual element, so it's not tagged element sensitive.

private boolean isNexSiblingAdjacent(Element span){
  Node informationAfterNode = span.nextSibling();
  Element nextTaggedElement = span.nextElementSibling();
  return informationAfterNode.outerHtml().trim().length() == 0 ||
 informationAfterNode.outerHtml().equalsIgnoreCase(nextTaggedElement.outerHtml());
}

So the first condition I do is to verify that it only has blank spaces inside but you can check if it starts with <!- and it ends with -> to check if it is a comment too. As these two conditions will make it still adjacent. And last but no least check if the html of the node is similar to the one in element.