I need to scrape (public) HTML Data after the JavaScript on the page has loaded. After doing some research, I found that PhantomJS can be useful in accomplishing this task. However, while I can add PhantomJS to my local computer, I don't know how to add it to my chrome extension. Does anybody know how I can accomplish this?
Asked
Active
Viewed 864 times
0
-
Possible duplicate of [Integrate chrome extensions with phantomjs](https://stackoverflow.com/questions/23603708/integrate-chrome-extensions-with-phantomjs) – Luka Čelebić Jun 14 '18 at 15:28
-
I also think using PhantomJS for some probably simple scraping is quite an overkill. – Luka Čelebić Jun 14 '18 at 15:30
-
@PredatorIWD Is there another (perhaps simpler) way to scrape data generated by JavaScript? I was using the Fetch API before but it only returns the raw HTML (before the JavaScript on the page is loaded). – Manny Jun 14 '18 at 17:25
1 Answers
1
You cannot. PhantomJS is a web browser, not a Javascript library.
Alternative 1. Scrape from the Chrome Extension
You can use the chrome extension APIs to do the following:
- Create a tab containing the page you want to scrape
- Load a content script into the tab that:
- Waits for the page to finish loading
- Scrapes the data you want
- Messages the scraped data to wherever you need it
- Close the tab
Alternative 2. Scrape with a Headless Browser Running on Your Own Server
Use Google's own headless Chrome library puppeteer to scrape the data you want. An easy way to get started for free is with a Google App Engine Standard Tier NodeJS instance.

Eejdoowad
- 1,297
- 9
- 10