0

I can't use XPath because the encoding gets weird. I hoped you could help me out of this trouble.

require "Nokogiri"
require "open-uri"
link = "http://www.arla.dk/Services/SearchService.asmx/RecipeResult?q=allRecipe&paging=6&include=&exclude=&area=recipeSearch&languageBranch=da"
doc = Nokogiri::HTML(open(link))
doc.xpath("//h2")

The xpath method returns an empty array. It looks like the document has not been parsed correct. I think it is due to the file being parsed contains the encoded characters:

<strong>Frokost til 8</strong>
<ul><li class='ingHeading'><strong><b>Flade
the Tin Man
  • 158,662
  • 42
  • 215
  • 303

2 Answers2

1

The response is XML so first parse it with Nokogiri::XML:

xml = Nokogiri::XML open(link)

then the first string contains some HTML so parse that with Nokogiri::HTML

doc = Nokogiri::HTML xml.at('string').text

Now you can do your search:

doc.xpath '//h2'
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
pguardiario
  • 53,827
  • 19
  • 119
  • 159
0

As stated above, the issue is that the HTML is encoded, which is why you are seeing escape sequences; For example, &lt; instead of <. To get around it, unescape the HTML.

"How do I encode/decode HTML entities in Ruby? basically suggests using htmlentities.

Community
  • 1
  • 1
AJcodez
  • 31,780
  • 20
  • 84
  • 118