Selenium is really slow for me, is there something wrong with my code?

Question

im new to webscraping and python. I have done a script before that worked just fine. Im doing basically the same thing in this one but it runs way slower. This is my code:

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import selenium
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
import time

start = time.time()
opp = Options()
opp.add_argument('-headless')
browser = webdriver.Firefox(executable_path = "/Users/0581279/Desktop/L&S/Watchlist/geckodriver", options=opp)
browser.delete_all_cookies()
browser.get("https://www.bloomberg.com/quote/MSGFINA:LX")

c = browser.page_source
soup = BeautifulSoup(c, "html.parser")
all = soup.find_all("span", {"class": "fieldValue__2d582aa7"})
price = all[6].text
browser.quit()
print(price)
end = time.time()
print(end-start)

Sometimes a single page can take up to 2 minutes to load. Also im just webscraping Bloomberg. Any help would be appreciated :)

Have you tried timing sections of your code to see which particular line(s) are making it so slow? This would help us see if it's an issue with your network or when you're parsing the result. — Nathan, Feb 26 '19 at 19:45
The problem seems to be with this browser.get("https://www.bloomberg.com/quote/MSGFINA:LX") — user11121374, Feb 26 '19 at 19:53
maybe its the way your create the webdriver. try using chrome, or remove the options as they are not needed. — , Feb 26 '19 at 19:57

score 2 · Accepted Answer · answered Feb 26 '19 at 22:06

Using requests and BeautifulSoup you can scrape information easy and fast. Here code to get Key Statistics for bloomberg's MSGFINA:LX:

import requests
from bs4 import BeautifulSoup

headers = {
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/72.0.3626.119 Safari/537.36',
    'DNT': '1'
}

response = requests.get('https://www.bloomberg.com/quote/MSGFINA:LX', headers=headers)
page = BeautifulSoup(response.text, "html.parser")

key_statistics = page.select("div[class^='module keyStatistics'] div[class^='rowListItemWrap']")
for key_statistic in key_statistics:
    fieldLabel = key_statistic.select_one("span[class^='fieldLabel']")
    fieldValue = key_statistic.select_one("span[class^='fieldValue']")
    print("%s: %s" % (fieldLabel.text, fieldValue.text))

i started getting "are you a robot pages" do you know how i can bypass this without having to use selenium ? — user11121374, Feb 27 '19 at 19:35

score 1 · Answer 2 · answered Feb 26 '19 at 21:11

Selenium effect some parameters like :

If the site is slow, the Selenium script is slow.

If the performance of the internet connection is not good, the Selenium script is slow.

If the computer running the script is not performing well, the Selenium script is slow.

These situations are not usually in our hands. But programming are. One of the ways to increase speed is blocking the images load (if we don't use it.) Blocking the load images will effect the runtime.This is the way to block it :

opp.add_argument('--blink-settings=imagesEnabled=false')

And when you open Driver you dont need to again use BeautifulSoap function to get datas. Selenium functions provide it.Try to below code , Selenium will faster

from selenium import webdriver

from selenium.webdriver.firefox.options import Options
import time

start = time.time()
opp = Options()
opp.add_argument('--blink-settings=imagesEnabled=false')

driver_path = r'Your driver path'
browser = webdriver.Chrome(executable_path=driver_path , options=opp)

browser.delete_all_cookies()
browser.get("https://www.bloomberg.com/quote/MSGFINA:LX")

get_element = browser.find_elements_by_css_selector("span[class='fieldValue__2d582aa7']")


print(get_element[6].text)
browser.quit()

end = time.time()
print(end-start)

score 0 · Answer 3 · answered Feb 26 '19 at 19:51

So I made some alterations to your code and could load it almost instantly, I used chrome driver which I had installed and then ran the following code.

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import selenium
import time

start = time.time()
browser = webdriver.Chrome("/Users/XXXXXXXX/Desktop/Programming/FacebookControl/package/chromedriver")
browser.get("https://www.bloomberg.com/quote/MSGFINA:LX")

c = browser.page_source
soup = BeautifulSoup(c, "html.parser")
all = soup.find_all("span", {"class": "fieldValue__2d582aa7"})
price = all[6].text
browser.quit()
print(price)
end = time.time()
print(end-start)

while testing they did block me lol, might want to change headers every once and a while. it printed the price as well.

chromedriver link http://chromedriver.chromium.org/

hope this helps.

output was this:

34.54
7.527994871139526

Selenium is really slow for me, is there something wrong with my code?

3 Answers3