1

I'm having trouble using xpath in Rapidminer when trying to retrieve reviews form the google play store. The problem seems to be that these reviews are in double quotes and I can't get rapidminer to spit out the text...only blanks. I have a number of other xpath queries that are working fine for other commands where i use divs and span etc. I'm able to get things to work on google spreadsheet for this query through =importXML, but not in rapidminer. This is what I have in XPATH:

//*[@class='review-text']")

So I added a /text() to the end and still nothing. I have played around with adding //div instead of //* and have used h:/span also. I'm kind of hoping there's a special syntax for retrieving quotes that i'm unaware of?

Here is the HTML i'm looking to scrape in the image below: https://i.stack.imgur.com/dl6I8.png

Please see my comment below on further failed tests. Thanks.

aheavey
  • 11
  • 2
  • The xpath seems correct but I don't know Rapidminer and don't see any relation between having quotation marks on the text and the inability to scrape. Try adding `/descendant-or-self::text` at the end. Not sure if having text mixed with tags at the div you want to scrape, could be an issue for Rapidminer. – derloopkat Dec 14 '17 at 22:51
  • Thanks for the suggestion. I've tried that and no luck. It's weird, so games that have review content are coming up blank. Whereas games without content are coming up with a question mark. So, I believe it's working it's just not spitting out the content for me. This is what I tried: //h:div[@class='review-text']/descendant::text() and also //h* at the start. Any more ideas anyone? – aheavey Dec 15 '17 at 10:29

0 Answers0