I have written a code for web scraping using python. The code extracts data of Macbook from amazon using selenium. Now I want to store these values in a Excel or MySql. There are various html/css class in a particular product row and one parent class which includes all the parameters of the product. To be Precise the code is:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import xlwt
from xlwt import Workbook
option = webdriver.ChromeOptions()
option.add_argument("--incognito")
browser = webdriver.Chrome(executable_path='/home/mukesh/Desktop/backup/Programminghub/whatsapp_python_scripts/chromedriver_linux64/chromedriver', chrome_options=option)
# go to website of interest
browser.get("https://www.amazon.in/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=macbook")
# wait up to 10 seconds for page to load
timeout = 10
try:
WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.XPATH, "//img[@class='s-access-image cfMarker']")))
except TimeoutException:
print("Timed out waiting for page to load")
browser.quit()
titles_element = browser.find_elements_by_xpath("//div[@class='s-item-container']")
titles = []
for x in titles_element:
value=x.text
value=value.encode('ascii', 'ignore')
titles.append(value)
print(titles)
Now the output that I get is highly unstructured and contains some parameters which are there only on certain products. For instance the Parameter: "Maximum Resolution" or "CPU model manufacture" are present only on certain laptops and not on all.I don't want such parameters.I want only these parameters: Product name(Title of the row), Price,Operating System,Cpu model family,computer memory size and display size which are present on all the laptops. I am unable to split the titles list in these sub list. I tried a foolish approach as well where I was able to split the products by accessing the individual classes of every parameters but then it didn't match up to correct values. Price of some other laptop was shown on some other plus sponsored ads caused problems in it. Link of website:Amazon Macbook Scraping I just want these parameters either in my list or excel or Mysql database : Product name(Title of the row), Price,Operating System,Cpu model family,computer memory size and display size(6 columns)