0

I want to get all the child nodes from a parent node who contains a certain text within one of them. In other words: I start a search on a certain child node that I'm sure contains some string I need. Once I've found it, instead of getting every other string from nodes that match the same Xpath expression, I need to get the other nodes on its same level. I'm using Java. For example:

     <table width="575" border="0" cellspacing="1" cellpadding="0">
                <tr> 
                  <td width="39" class="back1"><b class="texto4">CRN</b></td>
                  <td width="60" class="back1"><b class="texto4">Materia</b></td>
                  <td width="53" class="back1"><b class="texto4">Secci&oacute;n</b></td>
                  <td width="55" class="back1"><b class="texto4">Cr&eacute;ditos</b></td>
                  <td width="156" class="back1"><b class="texto4">T&iacute;tulo</b></td>
                  <td width="69" class="back1"><b class="texto4">Cupo</b></td>
                  <td width="57" class="back1"><b class="texto4">Inscritos</b></td>
                  <td width="77" class="back1"><b class="texto4">Disponible</b></td>
                </tr>
                <tr> 
                  <td width="39"><font class="texto4"> 
                    10110                        </font></td>
                  <td width="60"><font class="texto4"> 
                    IIND1000                        </font></td>
                  <td width="53"><font class="texto4"> 
                  <div align="center">
                    1                        </div></font></td>
                  <td width="55"><font class="texto4"> 
                    <div align="center">
                    3                       </div>
                    </font></td>
                  <td width="156"><font class="texto4"> 
                    INTROD. INGEN. INDUSTRIAL                        </font></td>
                  <td width="69"><font class="texto4"> 
                    100                        </font></td>
                  <td width="57"><font class="texto4"> 
                    100                        </font></td>
                  <td width="77"><font class="texto4"> 
                    0                        </font></td>
                </tr>
              </table>

If I look for IIND1000, I want to get every td element within that tr tag (10110,IIND1000, 1, 3, INTROD. INGEN. INDUSTRIAL, 100, 100, 0). Is this possible with Jtidy ? Any tips or recommendations? Thanks.

Jens
  • 67,715
  • 15
  • 98
  • 113
Hugo M. Zuleta
  • 572
  • 1
  • 13
  • 27
  • 1
    Can you add the code you have tried and an example xml? – Jens Jul 11 '14 at 05:57
  • Sorry. I just updated the question with an example of the obtained HTML document. So far I've tried this snip of code: `XPath xpath = XPathFactory.newInstance().newXPath(); XPathExpression expr = xpath.compile("//td[@width='39']/font/text()"); NodeList crn = (NodeList)expr.evaluate(doc, XPathConstants.NODESET);` This gets the text for EVERY node whose width is 39 in that table. What I need is to get all the nodes in the same level of the found text (where the text is equal to something the user inputs). – Hugo M. Zuleta Jul 11 '14 at 06:03
  • Please, also add your code attempts to your question and do not include them as as a comment. – Marcus Rickert Jul 11 '14 at 06:24

1 Answers1

1

You probably want this:

XPathExpression expr = 
     xpath.compile("//tr[td[normalize-space(font) = 'IIND1000']]/td/font/text()"); 

The condition in brackets checks the existence of a grandchild node with the desired criteria and will only output all the grandchildren of the matching <tr>.

Marcus Rickert
  • 4,138
  • 3
  • 24
  • 29
  • I will try this asap. Thanks! EDIT: Worked. I really appreciate it. It outputs these items: [10110, IIND1000, INTROD. INGEN. INDUSTRIAL, 100, 100, 0, 16886, IIND1000, INTROD. INGEN. INDUSTRIAL, 100, 100, 0]. I'm only going to have to somehow split the results in blocks of 6 so I can separate classes with the same code. – Hugo M. Zuleta Jul 11 '14 at 15:50
  • Would you happen to know why that expression can't get the text from the elements that have div after font? Third and fourth elements. I posted a different question for this, but maybe you know http://stackoverflow.com/questions/24668436/xpath-nodes-come-after-new-line – Hugo M. Zuleta Jul 11 '14 at 17:03