-1

I´m trying to scrape the price information for the products in the site using Beautiful Soup: 'https://www.compracerta.com.br/celulares-e-smartphones?page=1' But the price returns an empty list i.e. [].

Below my code:

from pandas_datareader import data as pdr
import numpy as np
import pandas as pd
from selenium import webdriver
import matplotlib.pyplot as plt
from datetime import datetime,timedelta
from bs4 import BeautifulSoup
import requests
import math
import re
from requests_html import HTMLSession, AsyncHTMLSession
from lxml import etree
import xlwt

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By

url_pag='https://www.compracerta.com.br/celulares-e-smartphones?page=1'
headers = {'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"}
executable_path = r'C:\Users\vinig\Downloads\chromedriver_win32\chromedriver.exe'
browser = webdriver.Chrome(executable_path=executable_path)
browser.get(url_pag)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
produtos = soup.find_all('div', class_="gallery-container shelf prateleira default n12colunas")
           
for produto in produtos:
         marca = produto.find('a', attrs={'class':"image"})
         preco = produto.find('div', attrs={'class':"price"})```

The result is:

Price is empty

The Elements Page is:

Element Page

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Davi Riani
  • 43
  • 4

1 Answers1

0

To scrape the image and price information of the products from the website you can use list comprehension and you can use the following locator strategies:

  • Using CSS_SELECTOR:

    driver.get('https://www.compracerta.com.br/celulares-e-smartphones?page=1')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
    images = [my_elem.get_attribute("href") for my_elem in driver.find_elements(By.CSS_SELECTOR, "article.box-produto a.prod-info")]
    prices = [my_elem.text for my_elem in driver.find_elements(By.CSS_SELECTOR, "article.box-produto a.prod-info p span.por > span")]
    for i,j in zip(images, prices):
      print(f"Image: {i} Price: {j}")
    driver.quit()
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Console Output:

    Image: https://www.compracerta.com.br/iphone-12-64gb-azul-2081266/p Price: R$ 3.849,00
    Image: https://www.compracerta.com.br/smartphone-samsung-galaxy-a03s-64gb-4gb-ram-4g-wi-fi-dual-chip-camera-tripla---selfie-5mp-6-5--preto-2074884/p Price: R$ 779,00
    Image: https://www.compracerta.com.br/iphone-11-128gb---preto-2049241/p Price: R$ 3.299,00
    Image: https://www.compracerta.com.br/iphone-11-128gb---branco-2049240/p Price: R$ 3.299,00
    Image: https://www.compracerta.com.br/smartphone-motorola-moto-g32-128gb-4gb-ram-65---preto-2110182/p Price: R$ 999,00
    Image: https://www.compracerta.com.br/smartphone-motorola-moto-g32-128gb-4gb-ram-65---rose-2110183/p Price: R$ 999,00
    Image: https://www.compracerta.com.br/smartphone-motorola-moto-edge-30-ultra-256gb-12gb-ram-camera-tripla-200-mp-ois-50-mp-12-mp-tela-6-7--white-2103032/p Price: R$ 4.999,00
    Image: https://www.compracerta.com.br/smartphone-motorola-moto-g52-128gb-4gb-ram-6-6%E2%80%9D-cam-tripla-50mp-8mp-2mp-selfie-16mp---branco-2096318/p Price: R$ 1.349,00
    Image: https://www.compracerta.com.br/moto-g82-5g-2096006/p Price: R$ 1.999,00
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • 1
    Great! I just add a time.sleep(5) after the click action button to take a time to capture the informations! Tks! – Davi Riani Mar 02 '23 at 03:31