Is there any way to only download and parse an XML document until an element is found using an XPathExpression? I'm using Java:
url = new URL("http://registroapps.uniandes.edu.co/scripts/adm_con_horario1_joomla.php?depto="+params[0]);
Tidy tidy = new Tidy();
tidy.setQuiet(true);
tidy.setXHTML(true);
tidy.setShowWarnings(false);
Document doc = tidy.parseDOM(url.openStream(), System.out);
// Use XPath to obtain whatever you want from the (X)HTML
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("//tr[td[normalize-space(font) = '"+params[1]+"']]/td/font/text()");
NodeList result = (NodeList)expr.evaluate(doc, XPathConstants.NODESET);
I'm getting the text from HTML documents like this one:
<table width="575" border="0" cellspacing="1" cellpadding="0">
<tr>
<td width="39" class="back1"><b class="texto4">CRN</b></td>
<td width="60" class="back1"><b class="texto4">Materia</b></td>
<td width="53" class="back1"><b class="texto4">Sección</b></td>
<td width="55" class="back1"><b class="texto4">Créditos</b></td>
<td width="156" class="back1"><b class="texto4">Título</b></td>
<td width="69" class="back1"><b class="texto4">Cupo</b></td>
<td width="57" class="back1"><b class="texto4">Inscritos</b></td>
<td width="77" class="back1"><b class="texto4">Disponible</b></td>
</tr>
<tr>
<td width="39"><font class="texto4">
10110 </font></td>
<td width="60"><font class="texto4">
IIND1000 </font></td>
<td width="53"><font class="texto4">
<div align="center">
1 </div></font></td>
<td width="55"><font class="texto4">
<div align="center">
3 </div>
</font></td>
<td width="156"><font class="texto4">
INTROD. INGEN. INDUSTRIAL </font></td>
<td width="69"><font class="texto4">
100 </font></td>
<td width="57"><font class="texto4">
100 </font></td>
<td width="77"><font class="texto4">
0 </font></td>
</tr>
</table>
<tr>
<td>
<table width="550" border="0" cellspacing="1" cellpadding="0">
<tr>
<td width="81" > </td>
<td width="172" class="back3" height="17"><b class="texto4">Días</b></td>
<td width="171" class="back3" height="17"><b class="texto4">Horas</b></td>
<td width="171" class="back3" height="17"><b class="texto4">Salón</b></td>
<td width="171" class="back3"><b class="texto4">F. Inicial</b></td>
<td width="171" class="back3"><b class="texto4">F. Final</b></td>
</tr>
<tr>
<td width="81" > </td>
<td width="172" height="17"><font class="texto4">
I </font></td>
<td width="171" height="17"><font class="texto4" >
0700 - 0820 </font></td>
<td width="171" height="17"><font class="texto4">
- - </font></td>
<td width="171"><font class="texto4" >28-JUL-14</font></td>
<td width="171"><font class="texto4" >15-NOV-14</font></td>
</tr>
<tr>
<td width="81" ><div align="right"><span class="back3"><font class="texto4"><strong>Instructor(es)</strong>:</font></span></div></td>
<td width="172" class="back3" height="17"><font class="texto4"><font class="texto4">
ALDANA VALDES EDUARDO </font></font></td>
<td width="171" class="back3" height="17"><font class="texto4">
</font></td>
<td width="171" class="back3" height="17"><font class="texto4"></font></td>
<td width="171" class="back3"> </td>
<td width="171" class="back3"> </td>
</tr>
</table> </td>
</tr>
So, for instance, as soon as that XPathExpression finds code 10110 (params[1]=10110)
on the first table, then I need for it not to download the next table. Instead, only all the text from the childs in the same level. The usual document size is over 10k lines and it becomes inefficient after a while, if the searched element is at the very beginning.