Requesting and parsing stock data but with multiple pages

Question

I use the following code to try and scrape the tickers of the firms on eToro. My only issue is that this code only generates only the first 50 as this is how many is displayed when looking at the url. You can show more on the site but this is not reflected in the request so I'm just wondering if it would be possible to get all of them.

import requests
import json
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
options = Options()
driver = webdriver.Chrome("chromedriver.exe", options = options)
driver.get("https://www.etoro.com/discover/markets/stocks/exchange/london")

soup = BeautifulSoup(driver.page_source, 'html.parser')

soup = BeautifulSoup(driver.page_source, 'html.parser')
section = soup.find_all("a", attrs = {"class" : "card-avatar-wrap"})
firms = []
for i in range(len(section)):
    firms.append(section[i]["href"].split("/")[2])

Yes, you can get all of them. You will have to figure out a way to click to move to the next page and then wrap everything in a loop or something. — DaveIdito, Nov 14 '20 at 22:10

DaveIdito · Accepted Answer · 2020-11-14T22:38:01.800

1

Built on top of your code, here's a working example. Of course, it's not a production grade code, so if you want to use it in a more re-usable way, you might have to add more bells and whistles.

import time
import requests
import json
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
options = Options()
driver = webdriver.Chrome(options = options)
driver.get("https://www.etoro.com/discover/markets/stocks/exchange/london")

firms = []

while True:
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    section = soup.find_all("a", attrs = {"class" : "card-avatar-wrap"})
    for i in range(len(section)):
        firms.append(section[i]["href"].split("/")[2])
        
    # The only new thing- try to click the button.
    try:
        _ = driver.find_element_by_xpath('/html/body/ui-layout/div/div/div[2]/et-discovery-markets-results/div/et-discovery-markets-results-header/div/div[2]/div/div[2]/a[2]')
        _.click()
        time.sleep(1) #add a delay
        
        if 'disabled' in _.get_attribute('class'):
            print('END- Last page reached')
            break 
    except:
        break

Update: This works for me. I'm not sure what's breaking it on your end, but here's how you can debug further-

Try checking if the xpath is same in your selenium browser as well.
Add a time.sleep(2) before the while loop.

edited Nov 14 '20 at 22:38

answered Nov 14 '20 at 22:27

DaveIdito

1,546
14
31

1

My `webdriver.Chrome(...` line is different from yours because I'm using UNIX, but everything else should be fine. – DaveIdito Nov 14 '20 at 22:30
@Daveldito I don't know if I am missing anything but this doesn't actually work for me. It is only does the first 50 like before. – 43zombiegit Nov 14 '20 at 22:34
1

Also, try clicking the `_` element programmatically (as in remove the lines from the loop and test in an IDLE maybe), does it work? It should. – DaveIdito Nov 14 '20 at 22:39
@Daveldito How do you get that xpath also? – 43zombiegit Nov 14 '20 at 22:39
1

@43zombiegit, you should really search things ;) Here: https://stackoverflow.com/questions/3030487/is-there-a-way-to-get-the-xpath-in-google-chrome – DaveIdito Nov 14 '20 at 22:40
@Davelido Whoops, got it now though! – 43zombiegit Nov 14 '20 at 22:45
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/224565/discussion-between-daveidito-and-43zombiegit). – DaveIdito Nov 14 '20 at 22:46

Requesting and parsing stock data but with multiple pages

1 Answers1