0

Here is the code I have so far. My next step is taking the right elements from the website ie. the names of the most recent articles and putting them in a list.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

PATH = "C:\webdrivers"
driver = webdriver.Chrome()

driver.get("https://www.cnbc.com/business/")
Razor262
  • 1
  • 1
  • 2
    Also add the code that you have tried to extract the news. – pmadhu Sep 27 '21 at 03:52
  • 2
    My next step is taking the right elements from the website ie. the names of the most recent articles and putting them in a list. - we need to see the code for this. – cruisepandey Sep 27 '21 at 06:32

1 Answers1

0

This is what you should do:

from selenium import webdriver
from selenium.webdriver import ActionChains

PATH = "/Users/Samuel/PycharmProjects/MoneyMachine/drivers/chromedriver"
driver = webdriver.Chrome(PATH)
driver.get("https://www.cnbc.com/business/")
action = ActionChains(driver)

list = []

for i in range(2):
    element = driver.find_element_by_xpath(f"/html/body/div[2]/div/div[1]/div[3]/div/div/div/div[3]/div[1]/div[1]/section/div/div[1]/div[{i+1}]/div/div/div/div[1]/div/a/div").text
    list.append(element)

for i in range(3):
    element = driver.find_element_by_xpath(f"/html/body/div[2]/div/div[1]/div[3]/div/div/div/div[3]/div[1]/div[1]/section/div/div[2]/div[{i+1}]/div/div/div/div[1]/div/a/div").text
    list.append(element)

driver.close()

print(list)

driver.find_element_by_xpath("XPATH") finds an element for you. To know what you should put into the quotes right-click the element you want, and select inspect. Then when you hover over the element in your inspect window right click and press copy full xpath.

I think you should check out BeautifulSoup (BS4) for this sort of project, think it will be better for your case. BS4 is more user friendly. Here are some more reasons you should use BS4 for this project:

" Bandwidth, and time to run your script. Using Selenium means fetching all the resources that would normally be fetched when you visit a page in a browser - stylesheets, scripts, images, and so on. This is probably unnecessary. Stability and ease of error recovery. Selenium can be a little fragile, in my experience - even with PhantomJS - and creating the architecture to kill a hung Selenium instance and create a new one is a little more irritating than setting up simple retry-on-exception logic when using requests. Potentially, CPU and memory usage - depending upon the site you're crawling, and how many spider threads you're trying to run in parallel, it's conceivable that either DOM layout logic or JavaScript execution could get pretty expensive. " From - Selenium versus BeautifulSoup for web scraping

realFishSam
  • 169
  • 14