parsing the result of ImportXML with XPATH in a google docs spread sheet

Question

I have the following function in a cell of my google doc spreadsheet:

=IMPORTXML(J15, "//meta[@name='description']/@content")

I need to capture values from the following line from this facebook page: https://www.facebook.com/pages/Christine-Simpson/135960496435727

<meta name="description" content="Christine Simpson. 68 likes · 3 talking about this. Christine Laura Simpson 
    is a Canadian television personality, best known as a hockey reporter, 
    host...">

This function works fine and has the following output into my google docs cell:

"Christine Simpson. 68 likes · 3 talking about this. Christine Laura Simpson 
is a Canadian television personality, best known as a hockey reporter, 
host..."

But, I need to capture only the "number of likes" from this string and the number for "talking about this" into my cell and get rid of all the other text.

I tried the following way in order to use the tokenize() function of XPATH on the returned string.

=IMPORTXML(J2, "tokenize('//meta[@name='description']/@content', '.')")

However, I get an error: "Imported XML cannot be parsed"

Any help would be appreciated.

Thx

`substring-after(substring-before(//meta[@name='description']/@content , "talking about this" ), ".")` returns `68 likes · 3` You can use translate function to remove letter :) — splash58, Jul 24 '15 at 15:47
It would help if you provided the actual URL, so we can test. Please also bear in mind that a large number of xpath functions are NOT implemented in google sheets importXML. You are better off running the importXML as described at the top of your post and then processing the string using standard worksheet functions, such as SPLIT() or INDEX() or MID() — user3616725, Jul 27 '15 at 09:42

score 0 · Answer 1 · answered Aug 01 '15 at 21:31

This is because IMPORTXML is expecting an Xpath expression (something like //meta[@name='description']/@content) as second argument.

You can use regular expressions to extract the data you need with

=REGEXEXTRACT(A1,"[0-9]+\s+likes")

=REGEXEXTRACT(A1,"[0-9]+\s+talking\s+about\s+this")

parsing the result of ImportXML with XPATH in a google docs spread sheet

1 Answers1