0

Have HTML:

   <table width="100%" cellpadding="0" cellspacing="0" border="0">
    <tr>
    <td width="27%" align="left" valign="top">
    <span class="param">Text0</span> 23<br />
    <span class="param">Text1</span> 173<br />
    <span class="param">Text2</span> 54<br />
    <span class="param">Text3</span> 2<br /><br />
    </td>
    <td width="27%" align="left" valign="top">
    <span class="param">Text4</span><br />
    one <br />
    two <br />
    three <br />
    </td>
    <td width="46%" align="left" valign="top">
    <span class="param">Text5</span><br /> 
    one -<br />
    two -<br />
    three -<br />
    </td>
    </tr>
    </table>

I can get value Text0-3 parse code change get(0)-get(3), but cant get Text4 and Text5:

Document doc = Jsoup.connect("text.html").get();

Element param = doc.select("span[class=param]").get(0);

Node node = param.nextSibling();

System.out.println(node.toString());

How get value Text4 and Text5? get(4) or get(5), now return br, but I need get "one, two, three"

Now i use this code:

Document doc = Jsoup.connect("text.hml").get();

        Elements params = doc.select("span[class=param]");
        int i;
        for (i=0; i<6; i++) {
        Element param = params.get(i);

        Node node = param.nextSibling();

        System.out.println(node.toString());

        }

this print:

 23
 173
 54
 2
<br>
<br>

I need:

 23
 173
 54
 2
 one two three
 one two three

Crazy code answer:

Document doc = Jsoup.connect("text.html").get();

        Elements params = doc.select("span[class=param]");
        int i;
        for (i=0; i<3; i++) {
        Element param = params.get(i);

        Node node = param.nextSibling();

        System.out.println(node.toString());
        }

        for (i=4; i<5; i++){

            Element apar = params.get(i);

            Node apan = apar.nextSibling();

            System.out.println("apar: "+apan.nextSibling().toString());
            System.out.println("apar: "+apan.nextSibling().nextSibling().nextSibling().toString());
            System.out.println("apar: "+apan.nextSibling().nextSibling().nextSibling().nextSibling().nextSibling().toString());
            //System.out.println(apan.nextSibling().toString());


        }
        for (i=5; i<6; i++){

            Element vih = params.get(i);

            Node vihn = vih.nextSibling();

            System.out.println("vih: "+vihn.nextSibling().toString());
            System.out.println("vih: "+vihn.nextSibling().nextSibling().nextSibling().toString());
            System.out.println("vih: "+vihn.nextSibling().nextSibling().nextSibling().nextSibling().nextSibling().toString());
            //System.out.println(apan.nextSibling().toString());


        }

    }

this crazy(?) code prints what I want.

yar1k
  • 21
  • 2
  • What does your `node.toString()` print? – pczeus Mar 25 '16 at 15:59
  • if set: get(0), print 23. If set get(5), print
    . but I need "one two three"
    – yar1k Mar 25 '16 at 17:08
  • Now that you have updated the question, it makes less sense. You initially were looking to get the data within the elements. But now you are trying to get random data that has nothing to do with the span elements, but are actual data within the elements. Maybe you can just describe what it is you are trying to accomplish. – pczeus Mar 25 '16 at 22:29

1 Answers1

0

When you do a Element param = doc.select("span[class=param]") you get back a List of elements. You need to iterate over the list to proces each <span> element. In your code you are only grabbing one by doing a Element param = doc.select("span[class=param]").get(0);

Document doc = Jsoup.connect("text.hml").get(); 
Elements params = doc.select("span[class=param]");
for(Element element: params){
    //Will print out the text contained within the <span>...</span>
    System.out.println(element.ownText());
}

params = doc.select("td");
for(Element element: params){
    //Will print out the text contained in all children nodes of <td> nodes, that are text nodes 
    System.out.println(element.ownText());
    //System.out.println(element.text());
}

The above code will print out:

Text0
Text1
Text2
Text3
Text4
Text5
23 173 54 2
one two three
one - two - three -

This should be enough to get you where you are going. Good Luck!

pczeus
  • 7,709
  • 4
  • 36
  • 51
  • if I use each I dont know how to get nextSibling value, Node not work with Elements. – yar1k Mar 25 '16 at 16:51
  • Document doc = Jsoup.connect("text.hml").get(); Elements params = doc.select("span[class=param]"); int i; for (i=0; i<6; i++) { Element param = params.get(i); Node node = param.nextSibling(); System.out.println(node.toString()); } – yar1k Mar 25 '16 at 17:17
  • I will update the answer with @yar1k comment to make it more readable. – pczeus Mar 25 '16 at 17:29
  • Yea! Thanks, I update my first Q, construction nextSibling().nextSibling().nextSibling().nextSibling().nextSibling().toString()) normal? I can use more simple code? – yar1k Mar 25 '16 at 17:55
  • I have updated the answer to show you how to get the text contained with the tags as well as the text in all child text nodes of , which appears to be what you are after. – pczeus Mar 25 '16 at 22:34