1

What would be the correct Xpath query to use to fetch data from Wikipedia into Google Sheets?

Here is an example I would like to test this with:

Wikipedia page: http://en.wikipedia.org/wiki/12_Angry_Men_(1957_film)

Data to pull: the "running time" value of "96 minutes" located in the right hand side table

Method: using the Google Sheets ImportXML function

I've tried the following, but it returns N/A:

=IMPORTXML("http://en.wikipedia.org/wiki/12_Angry_Men_(1957_film)", "//div[normalize-space() = 'Running time']/following-sibling::td")

Thanks!

Steve
  • 331
  • 1
  • 6
  • 14

1 Answers1

0

There are a couple of problems with your XPath.

The following-sibling axis won't work on the markup of that page because the td following the 'Running time' div is a sibling of its parent th. Instead, use the following axis with a node type selector: following::td. However, that still returns all td nodes after the selected div, so additionally we need a predicate to select only the first node: [1].

Complete function with XPath:

=IMPORTXML("http://en.wikipedia.org/wiki/12_Angry_Men_%281957_film%29", "//div[normalize-space()='Running time']/following::td[1]")
dartonw
  • 300
  • 1
  • 6