0

Been scouring the internet for solutions to this, so finally creating my own post in the hopes someone will be able to help:

I'm looking to do some web scraping from Newegg in a similar way that I've already done for Best Buy. I'm familiar with the fact that IMPORTXML cannot scrape results loaded from JavaScript, but I don't think that's the case here as I can still see the content I'm looking for when Javascript is disabled.

Here is the formula I'm currently running:

IMPORTXML("https://www.newegg.com/black-msi-creator-15-a11ue-491-creating-designing/p/N82E16834156059","/html/body/div[8]/div[4]/div/div/div/div[1]/div[1]/div[2]/div[2]/ul/li[3]/strong")

I am able to scrape the title and images on the site with:

IMPORTXML("https://www.newegg.com/black-msi-creator-15-a11ue-491-creating-designing/p/N82E16834156059","/html/head/title | //img/@src")

so I know at least some content can be scraped.

Any assistance would be greatly appreciated as I can't find anybody else with the same problem.

  • So your question how to scrape JSON from that website – Ricardo Aranguren Apr 11 '22 at 21:23
  • @RicardoAranguren No, what I'm looking to scrape isn't (to my knowledge) loaded via JavaScript. It still loads on the page with JavaScript disabled. – Aram Howard Apr 11 '22 at 21:27
  • Your first formula returns the price of the item. What else do you need? I understand that you can "still see the content" you are looking for, but you didn't say what it is – Ricardo Aranguren Apr 11 '22 at 21:37
  • Ah, I actually see what may be causing the issue now. In my formula I don't have that exact formula, I have the URL being referenced in another cell that isn't *exactly* that but brings to the same webpage. Is there a chance the importxml isn't following through to the final page? – Aram Howard Apr 11 '22 at 21:47
  • That could certainly be the issue because IMPORTXML doesn't know what a "final page" is, and only parses the page in that url. – Ricardo Aranguren Apr 11 '22 at 21:54

0 Answers0