0

In org.htmlparser I want to get tbody node by id

Parser htmlParser =  Parser.createParser("<table id='_table' border='0' cellspacing='0' cellpadding='0' class='tableRegion' width='100%' ><thead><tr><td>1</td><td>2</td></tr></thead><tbody id='_table_body' ><tr><td>4</td><td>5</td></tr></tbody></table>","gbk"); 
NodeFilter filter = new HasAttributeFilter("id", "_table_body"); 
NodeFilter f = new AndFilter(new TagNameFilter("tr"), new HasParentFilter(filter)); 
NodeList nodelist1 = htmlParser.parse(filter); //Tag (144[0,144],173[0,173]): tbody id='_table_body' 
NodeList nodelist2 = htmlParser.parse(f); //

Why doesn't nodelist1 read <tr><td>4</td><td>5</td></tr>?

Yi Jiang
  • 49,435
  • 16
  • 136
  • 136
idleman
  • 1
  • 1

1 Answers1

0

If you get the <tbody> node, you should expect to have:

<tbody id='_table_body' ><tr><td>4</td><td>5</td></tr></tbody>

rather than

<tr><td>4</td><td>5</td>

The latter is a child node of the <tbody> element ... not the element itself. Basically, your code (using filter) looks like it is giving you the right stuff.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216