0

I am using WebHarvest to try to receive data from Woot.com and I'm getting a few different errors. I am able to get the website with the first process, but when I try to test xpath inside of the variable window I get the error org.xml.sax.SAXParseException; lineNumber: 86; columnNumber: 99; The reference to entity "pt2" must end with the ';' delimiter. If I try to use the pretty print function it returns XML is not well-formed: the reference to entity "pt2" must end with the ';' delimiter. {line: 86, col:99]. Lastly, Inside of the script I am writing, if I put in the xpath tag with an expression, I get element type "xpath" must be followed by either attributespecifications,">" or "/>". Can someone tell me what I am doing wrong? I am very new to WebHarvest and don't have any experience with this kind of program.

My code is:

<?xml version="1.0" encoding="UTF-8"?><config>
<xpath expression="(//div[@class="overview"])[1]//h2/text()">
<html-to-xml>
<http url="http://www.woot.com/"/>
</html-to-xml>
</xpath>
</config>
  • Please share your configuration file which you have created to get desired output. And let us know what actually you want to get from Woot.com URL – Navin Rawat Apr 29 '13 at 04:33

1 Answers1

0

To make the XML well-formed you have use &apos; instead of &quot; within the attribute expression. And here it goes:

<?xml version="1.0" encoding="UTF-8"?><config>
<xpath expression="(//div[@class='overview'])[1]//h2/text()">
<html-to-xml>
<http url="http://www.woot.com/"/>
</html-to-xml>
</xpath>
</config>

You could use &apos; or &quot; to wrap an attribute. But, it cannot be nested anyway. Here are few examples:

 <xpath expression='(//div[@class="overview"])[1]//h2/text()'>           --- valid
 <xpath expression='(//div[@class='overview'])[1]//h2/text()'>           --- invalid
 <xpath expression="(//div[@class="overview"])[1]//h2/text()">           --- invalid
 <xpath expression='(//div[@class=&apos;overview&apos;])[1]//h2/text()'> --- valid
 <xpath expression="(//div[@class=&apos;overview&apos;])[1]//h2/text()"> --- valid
 <xpath expression="(//div[@class=&quot;overview&quot;])[1]//h2/text()"> --- valid

Hope this helps.

Cylian
  • 10,970
  • 4
  • 42
  • 55