4

im new to webscraping and python. I have done a script before that worked just fine. Im doing basically the same thing in this one but it runs way slower. This is my code:

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import selenium
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
import time

start = time.time()
opp = Options()
opp.add_argument('-headless')
browser = webdriver.Firefox(executable_path = "/Users/0581279/Desktop/L&S/Watchlist/geckodriver", options=opp)
browser.delete_all_cookies()
browser.get("https://www.bloomberg.com/quote/MSGFINA:LX")

c = browser.page_source
soup = BeautifulSoup(c, "html.parser")
all = soup.find_all("span", {"class": "fieldValue__2d582aa7"})
price = all[6].text
browser.quit()
print(price)
end = time.time()
print(end-start)

Sometimes a single page can take up to 2 minutes to load. Also im just webscraping Bloomberg. Any help would be appreciated :)

user11121374
  • 43
  • 1
  • 3
  • 1
    Have you tried timing sections of your code to see which particular line(s) are making it so slow? This would help us see if it's an issue with your network or when you're parsing the result. – Nathan Feb 26 '19 at 19:45
  • The problem seems to be with this browser.get("https://www.bloomberg.com/quote/MSGFINA:LX") – user11121374 Feb 26 '19 at 19:53
  • maybe its the way your create the webdriver. try using chrome, or remove the options as they are not needed. –  Feb 26 '19 at 19:57

3 Answers3

2

Using requests and BeautifulSoup you can scrape information easy and fast. Here code to get Key Statistics for bloomberg's MSGFINA:LX:

import requests
from bs4 import BeautifulSoup

headers = {
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/72.0.3626.119 Safari/537.36',
    'DNT': '1'
}

response = requests.get('https://www.bloomberg.com/quote/MSGFINA:LX', headers=headers)
page = BeautifulSoup(response.text, "html.parser")

key_statistics = page.select("div[class^='module keyStatistics'] div[class^='rowListItemWrap']")
for key_statistic in key_statistics:
    fieldLabel = key_statistic.select_one("span[class^='fieldLabel']")
    fieldValue = key_statistic.select_one("span[class^='fieldValue']")
    print("%s: %s" % (fieldLabel.text, fieldValue.text))
Sers
  • 12,047
  • 2
  • 12
  • 31
1

Selenium effect some parameters like :

If the site is slow, the Selenium script is slow.

If the performance of the internet connection is not good, the Selenium script is slow.

If the computer running the script is not performing well, the Selenium script is slow.

These situations are not usually in our hands. But programming are. One of the ways to increase speed is blocking the images load (if we don't use it.) Blocking the load images will effect the runtime.This is the way to block it :

opp.add_argument('--blink-settings=imagesEnabled=false')

And when you open Driver you dont need to again use BeautifulSoap function to get datas. Selenium functions provide it.Try to below code , Selenium will faster

from selenium import webdriver

from selenium.webdriver.firefox.options import Options
import time

start = time.time()
opp = Options()
opp.add_argument('--blink-settings=imagesEnabled=false')

driver_path = r'Your driver path'
browser = webdriver.Chrome(executable_path=driver_path , options=opp)

browser.delete_all_cookies()
browser.get("https://www.bloomberg.com/quote/MSGFINA:LX")

get_element = browser.find_elements_by_css_selector("span[class='fieldValue__2d582aa7']")


print(get_element[6].text)
browser.quit()

end = time.time()
print(end-start)
Omer Tekbiyik
  • 4,255
  • 1
  • 15
  • 27
0

So I made some alterations to your code and could load it almost instantly, I used chrome driver which I had installed and then ran the following code.

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import selenium
import time

start = time.time()
browser = webdriver.Chrome("/Users/XXXXXXXX/Desktop/Programming/FacebookControl/package/chromedriver")
browser.get("https://www.bloomberg.com/quote/MSGFINA:LX")

c = browser.page_source
soup = BeautifulSoup(c, "html.parser")
all = soup.find_all("span", {"class": "fieldValue__2d582aa7"})
price = all[6].text
browser.quit()
print(price)
end = time.time()
print(end-start)

while testing they did block me lol, might want to change headers every once and a while. it printed the price as well.

chromedriver link http://chromedriver.chromium.org/

hope this helps.

output was this:

34.54
7.527994871139526