0

I need to scrape (public) HTML Data after the JavaScript on the page has loaded. After doing some research, I found that PhantomJS can be useful in accomplishing this task. However, while I can add PhantomJS to my local computer, I don't know how to add it to my chrome extension. Does anybody know how I can accomplish this?

Manny
  • 1
  • 2
  • Possible duplicate of [Integrate chrome extensions with phantomjs](https://stackoverflow.com/questions/23603708/integrate-chrome-extensions-with-phantomjs) – Luka Čelebić Jun 14 '18 at 15:28
  • I also think using PhantomJS for some probably simple scraping is quite an overkill. – Luka Čelebić Jun 14 '18 at 15:30
  • @PredatorIWD Is there another (perhaps simpler) way to scrape data generated by JavaScript? I was using the Fetch API before but it only returns the raw HTML (before the JavaScript on the page is loaded). – Manny Jun 14 '18 at 17:25

1 Answers1

1

You cannot. PhantomJS is a web browser, not a Javascript library.

Alternative 1. Scrape from the Chrome Extension

You can use the chrome extension APIs to do the following:

  1. Create a tab containing the page you want to scrape
  2. Load a content script into the tab that:
    1. Waits for the page to finish loading
    2. Scrapes the data you want
    3. Messages the scraped data to wherever you need it
  3. Close the tab

Alternative 2. Scrape with a Headless Browser Running on Your Own Server

Use Google's own headless Chrome library puppeteer to scrape the data you want. An easy way to get started for free is with a Google App Engine Standard Tier NodeJS instance.

Eejdoowad
  • 1,297
  • 9
  • 10