0

I use the following code to try and scrape the tickers of the firms on eToro. My only issue is that this code only generates only the first 50 as this is how many is displayed when looking at the url. You can show more on the site but this is not reflected in the request so I'm just wondering if it would be possible to get all of them.

import requests
import json
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
options = Options()
driver = webdriver.Chrome("chromedriver.exe", options = options)
driver.get("https://www.etoro.com/discover/markets/stocks/exchange/london")

soup = BeautifulSoup(driver.page_source, 'html.parser')

soup = BeautifulSoup(driver.page_source, 'html.parser')
section = soup.find_all("a", attrs = {"class" : "card-avatar-wrap"})
firms = []
for i in range(len(section)):
    firms.append(section[i]["href"].split("/")[2]) 
43zombiegit
  • 115
  • 1
  • 10
  • Yes, you can get all of them. You will have to figure out a way to click to move to the next page and then wrap everything in a loop or something. – DaveIdito Nov 14 '20 at 22:10

1 Answers1

1

Built on top of your code, here's a working example. Of course, it's not a production grade code, so if you want to use it in a more re-usable way, you might have to add more bells and whistles.

import time
import requests
import json
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
options = Options()
driver = webdriver.Chrome(options = options)
driver.get("https://www.etoro.com/discover/markets/stocks/exchange/london")

firms = []

while True:
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    section = soup.find_all("a", attrs = {"class" : "card-avatar-wrap"})
    for i in range(len(section)):
        firms.append(section[i]["href"].split("/")[2])
        
    # The only new thing- try to click the button.
    try:
        _ = driver.find_element_by_xpath('/html/body/ui-layout/div/div/div[2]/et-discovery-markets-results/div/et-discovery-markets-results-header/div/div[2]/div/div[2]/a[2]')
        _.click()
        time.sleep(1) #add a delay
        
        if 'disabled' in _.get_attribute('class'):
            print('END- Last page reached')
            break 
    except:
        break

Update: This works for me. I'm not sure what's breaking it on your end, but here's how you can debug further-

  1. Try checking if the xpath is same in your selenium browser as well.
  2. Add a time.sleep(2) before the while loop.
DaveIdito
  • 1,546
  • 14
  • 31