-1

I am trying to scrape the table data from this table URL: https://covid19criticalcare.com/pharmacies/ enter image description here

On my previous scrape I used the following Python packages: from bs4 import BeautifulSoup import requests import mysql.connector import pandas as pd from sqlalchemy import create_engine

But this url's HTML doesn't contain the table data on it, instead it seems to be drawing the data from an external database. enter image description here

Could someone please point me in the right direction for scraping a table data with this sort of HTML setup using a python script?

I tried doing a blind scrape, by using the method I used on my previous scrape.

from bs4 import BeautifulSoup
import requests
import mysql.connector
import pandas as pd
from sqlalchemy import create_engine

url = "https://covid19criticalcare.com/pharmacies/"

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
result = requests.get(url, headers = headers)
doc = BeautifulSoup(result.text, "html.parser")

name = doc.find_all("td", class\_="column-1")

td_pharmacy_name = \[\]

for td in name:
names = td.text
td_names.append(names)
print(td_names)
HedgeHog
  • 22,146
  • 4
  • 14
  • 36
K_Tech
  • 63
  • 1
  • 3
  • Use your browser's developer tools and look at the traffic... you'll probably find the data's coming from XHR requests which you can then emulate. – Jon Clements Apr 02 '22 at 21:11

2 Answers2

2

The content of what you are trying to scrape is available when the Javascript on the website gets rendered. The simplest way for this is to either mock the request using the same Rest API method or use a library that helps rendered the content; for instance, Selenium, Scrapy, etc.

For more details on how to scrape JS-rendered content, you can check out this thread Web-scraping JavaScript page with Python

For basic troubleshooting on how you can view the request and response, you can open up the Chrome Developer Tool by right click on the HTML page > click on "inspect" > click on "Network" tab > click on "Fetch/XHR" > Press "command + Shift + R" to reload your page once.

If you are unsure which request contains the data you are looking for, you can use command + F to search and type in the keyword, and Chrome will list out the requests that match your searches

This image shows that the data is sent using AJAX and it also depicts the result of the steps above

EDIT 1

If you want to go for Selenium in order to avoid the hassle of mimicking the web request, your code should look something like this.

from selenium import webdriver
import pandas
import time

if __name__ == "__main__":
    driver = webdriver.Chrome()
    driver.get("https://covid19criticalcare.com/pharmacies/")
    time.sleep(7)
    df = pandas.read_html(browser.page_source)[0]
    print(df)
anonymous
  • 56
  • 3
  • Could you help direct me in the next step? – K_Tech Apr 03 '22 at 06:50
  • Well, if you would like to go for a hassle-free approach, use Selenium. It will load the website for you and you don't have to crack the website. Your Selenium code should look something like the one in my edit. – anonymous Apr 03 '22 at 18:51
1

Just as alternative to @Naphat Theerawats answer and while I noticed that you started with a seleniumbased solution you could get your goal with that much easier in combination withpandas`.

Load the website and extract table from driver.page_source with pd.read_html() - To avoid iterating each page just select Show All entries

Example

from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
import pandas as pd

url = 'https://covid19criticalcare.com/pharmacies/'

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
driver.get(url)
wait = WebDriverWait(driver, 5)
        
select = Select(wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '[name = "DataTables_Table_0_length"'))))
select.select_by_value('-1')
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'a.paginate_button.next.disabled')))

df = pd.read_html(driver.page_source, displayed_only=False)[1]
driver.close()

df

Output

Pharmacy Name Email Phone Website Requires prescription? Pharmacy Address Based in the United States? Overnight shipping to the United States? Overnight International shipping? Ships to the following States/Provinces
0 Covid Pharmacy sales@0covidpharmacy.com (785) 672 9222 0covidpharmacy.com NO 245 Krishna Market Channi RoadNagpur, Maharashtra 440001India NO YES YES AlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelawareDistrict of ColumbiaFloridaGeorgiaHawaiiIdahoIllinoisIndianaIowaKansasKentuckyLouisianaMaineMarylandMassachusettsMichiganMinnesotaMississippiMissouriMontanaNebraskaNevadaNew HampshireNew JerseyNew MexicoNew YorkNorth CarolinaNorth DakotaOhioOklahomaOregonPennsylvaniaRhode IslandSouth CarolinaSouth DakotaTennesseeTexasUtahVermontVirginiaWashingtonWest VirginiaWisconsinWyomingGuamPuerto RicoVirgin IslandsArmed Forces AmericasArmed Forces EuropeArmed Forces Pacific
1 Ivermectin Service ask24@1ivermectin.com (888) 290 0964 (US), +91 22509 72606 (IN) 1ivermectin.com NO 1/16, First Floor, Tardeo Air Conditioned Market Building, TardeoMumbai, Tardeo 400034India NO YES YES AlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelawareDistrict of ColumbiaFloridaGeorgiaIdahoIllinoisIndianaIowaKansasKentuckyLouisianaMaineMarylandMassachusettsMichiganMinnesotaMississippiMissouriMontanaNebraskaNevadaNew HampshireNew JerseyNew MexicoNew YorkNorth CarolinaNorth DakotaOhioOklahomaOregonPennsylvaniaRhode IslandSouth CarolinaSouth DakotaTennesseeTexasUtahVermontVirginiaWashingtonWest VirginiaWisconsinWyomingPuerto RicoVirgin Islands
1 Life Pharmacy sales@1lifepharmacy.net (888) 560-0430 (US); +91 (807 ) 127-9990 (India) 1lifepharmacy.net NO 302, Pride Plaza, Rajkot, 360002Rajkot, Gujarat 360002; 84118India NO YES YES AlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelawareDistrict of ColumbiaFloridaGeorgiaHawaiiIdahoIllinoisIndianaIowaKansasKentuckyLouisianaMaineMarylandMassachusettsMichiganMinnesotaMississippiMissouriMontanaNebraskaNevadaNew HampshireNew JerseyNew MexicoNew YorkNorth CarolinaNorth DakotaOhioOklahomaOregonPennsylvaniaRhode IslandSouth CarolinaSouth DakotaTennesseeTexasUtahVermontVirginiaWashingtonWest VirginiaWisconsinWyoming
1-2-3 RX Global Pharmacy doctor@123rx.net (516) 758-2630 123rx.net NO 2967 Dundas St. W.Toronto, Ontario M6P 1Z2Canada NO YES YES AlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelawareDistrict of ColumbiaFloridaGeorgiaHawaiiIdahoIllinoisIndianaIowaKansasKentuckyLouisianaMaineMarylandMassachusettsMichiganMinnesotaMississippiMissouriMontanaNebraskaNevadaNew HampshireNew JerseyNew MexicoNew YorkNorth CarolinaNorth DakotaOhioOklahomaOregonPennsylvaniaRhode IslandSouth CarolinaSouth DakotaTennesseeTexasUtahVermontVirginiaWashingtonWest VirginiaWisconsinWyoming
12 Angel Pharmacy Store 12angel.store@gmail.com (908) 866-4260 12angel.store NO 1050 Bharat Diamond BourseBandra Kurla ComplexMumbai, Maharashtra 400051India NO YES YES AlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelawareDistrict of ColumbiaFloridaGeorgiaHawaiiIdahoIllinoisIndianaIowaKansasKentuckyLouisianaMaineMarylandMassachusettsMichiganMinnesotaMississippiMissouriMontanaNebraskaNevadaNew HampshireNew JerseyNew MexicoNew YorkNorth CarolinaNorth DakotaOhioOklahomaOregonPennsylvaniaRhode IslandSouth CarolinaSouth DakotaTennesseeTexasUtahVermontVirginiaWashingtonWest VirginiaWisconsinWyomingGuamPuerto RicoVirgin IslandsArmed Forces AmericasArmed Forces EuropeArmed Forces Pacific
24 x 7 Pharma contact@24x7pharma.com (851) 127-5721 24x7pharma.com NO Mahek IconSumul Diary Road, KatargamSurat, Gujarat 395003India NO YES YES AlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelawareDistrict of ColumbiaFloridaGeorgiaHawaiiIdahoIllinoisIndianaIowaKansasKentuckyLouisianaMaineMarylandMassachusettsMichiganMinnesotaMississippiMissouriMontanaNebraskaNevadaNew HampshireNew JerseyNew MexicoNew YorkNorth CarolinaNorth DakotaOhioOklahomaOregonPennsylvaniaRhode IslandSouth CarolinaSouth DakotaTennesseeTexasUtahVermontVirginiaWashingtonWest VirginiaWisconsinWyomingGuamPuerto RicoVirgin IslandsArmed Forces AmericasArmed Forces EuropeArmed Forces Pacific

...

HedgeHog
  • 22,146
  • 4
  • 14
  • 36