1

I have the following function in a cell of my google doc spreadsheet:

=IMPORTXML(J15, "//meta[@name='description']/@content")

I need to capture values from the following line from this facebook page: https://www.facebook.com/pages/Christine-Simpson/135960496435727

<meta name="description" content="Christine Simpson. 68 likes · 3 talking about this. Christine Laura Simpson 
    is a Canadian television personality, best known as a hockey reporter, 
    host...">

This function works fine and has the following output into my google docs cell:

"Christine Simpson. 68 likes · 3 talking about this. Christine Laura Simpson 
is a Canadian television personality, best known as a hockey reporter, 
host..."

But, I need to capture only the "number of likes" from this string and the number for "talking about this" into my cell and get rid of all the other text.

I tried the following way in order to use the tokenize() function of XPATH on the returned string.

=IMPORTXML(J2, "tokenize('//meta[@name='description']/@content', '.')")

However, I get an error: "Imported XML cannot be parsed"

Any help would be appreciated.

Thx

am1234
  • 59
  • 2
  • 6
  • `substring-after(substring-before(//meta[@name='description']/@content , "talking about this" ), ".")` returns `68 likes · 3` You can use translate function to remove letter :) – splash58 Jul 24 '15 at 15:47
  • It would help if you provided the actual URL, so we can test. Please also bear in mind that a large number of xpath functions are NOT implemented in google sheets importXML. You are better off running the importXML as described at the top of your post and then processing the string using standard worksheet functions, such as SPLIT() or INDEX() or MID() – user3616725 Jul 27 '15 at 09:42
  • @user3616725 thanks for the comment, I added the url – am1234 Jul 27 '15 at 13:42

1 Answers1

0

This is because IMPORTXML is expecting an Xpath expression (something like //meta[@name='description']/@content) as second argument.

You can use regular expressions to extract the data you need with

=REGEXEXTRACT(A1,"[0-9]+\s+likes")

=REGEXEXTRACT(A1,"[0-9]+\s+talking\s+about\s+this")
user2314737
  • 27,088
  • 20
  • 102
  • 114