32

I will connect to a url through jsoup and get all the contents of it but the thing is if I select like,

doc.select("body")

its returning a single element but I want to get all the elements in the page and iterate them one by one for example,

<html>
<head><title>Test</title></head>
<body>
<p>Hello All</p>
<a href="test.html">Second Page</a>
<div>Test</div>
</body>
</html>

If I select using body I am getting the result in a single line like,

Test Hello All Second Page Test

Instead I want to select all elements and iterate one by one and produce the results like,

Test
Hello All
Second Page
Test

Will that be possible using jsoup?

Thanks,
Karthik

Karthik
  • 804
  • 4
  • 15
  • 24

3 Answers3

61

You can select all elements of the document using * selector and then get text of each individually using Element#ownText().

Elements elements = document.body().select("*");

for (Element element : elements) {
    System.out.println(element.ownText());
}
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • 1
    No this is also producing the same output, any idea? – Karthik Aug 16 '11 at 09:26
  • 1
    Then it are no direct children of the body as you demonstrated in your question. I'll update the answer. – BalusC Aug 16 '11 at 10:46
  • 2
    you should use document.getAllElements() instead that selector. see https://jsoup.org/apidocs/org/jsoup/nodes/Element.html#getAllElements-- – Snackaholic Apr 25 '19 at 13:02
  • I'm getting the error "Can only iterate over an array or an instance of java.lang.iterable" – ReZ Oct 16 '20 at 21:13
4

To get all of the elements within the body of the document using jsoup library.

doc.body().children().select("*");

To get just the first level of elements in the documents body elements.

doc.body().children();

  • 1
    That is an important distinction that is not immediately obvious from the other answers. Thanks. To get the first level elements of the document body and *their* children (second example), as opposed to a flat list of all the elements within the body tag (first example). – Murrah Feb 18 '17 at 05:19
0

You can use XPath or any library which contain XPath

the expression is //text()

Test the expression with your xml here

zawhtut
  • 8,335
  • 5
  • 52
  • 76