1

So I'm trying to scrape prices of a product on a website, and their HTML looks like this:

<div class="pricing_price">€12.99</div>

Now I've wrote a xpath query that gets price, and it returns a string like this:

€ 12.99.

If possible, I would like to just get the 12.99. What are my options? Should I use regular expressions? Or are there better/easier solutions?

user1333327
  • 795
  • 1
  • 6
  • 11
  • Looks like a charset problem. Make sure you're parsing in the same charset as the document is displayed in. – DCoder Jul 28 '12 at 16:19
  • I cant find any mentions of `utf8` or something differently on the page – user1333327 Jul 28 '12 at 16:26
  • 1
    How do you parse the document? XPath works on a tree model with Unicode characters created by an XML or HTML parser so the problem is not with changing the XPath expression, you need to make sure the parser you use parses the document with the encoding/character set it has been written with. – Martin Honnen Jul 28 '12 at 17:46
  • So if the page has `` I should do the same and it works? – user1333327 Aug 29 '12 at 17:47

0 Answers0