0

I am trying to use scrapy + splash to scrape this site https://www.teammitsubishihartford.com/new-inventory/index.htm?compositeType=new. But i am unable to extract any data from the site. When I try rendering the webpage using splash api (browser), I came to know that the site is not fully loaded (splash rendering returns a partially loaded website image). How can I render the site completly??

1 Answers1

0

@Vinu Abraham, If your requirement is not specific to scrapy + splash, you can use selenium. This issue occurs when we try to scrape a dynamic site. Below is the code snippet for reference.

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import re
from csv import writer

# url of the page we want to scrape
url = 'https://www.*******/drugs-all-medicines'

driver = webdriver.Chrome('./chromedriver')
driver.get(url)
time.sleep(5)

html = driver.page_source
soup = BeautifulSoup(html, "html.parser")
all_divs = soup.find('div', {'class': 'style__container___1i8GI'})

Also let me know if you get any solution for the same using scrapy.

  • I was able to scrape the site using selenium.. My my primary intention was to scrap using splash + scrapy. Also, isn't splash used to scrap render and scrap dynamic pages. – Vinu Abraham Jun 15 '21 at 11:24