-1

Goal is to get Python / BeautifulSoup to scrape Yahoo Finance and the first/last name of public company owner:

from bs4 import BeautifulSoup
import requests

    url = 'https://finance.yahoo.com/quote/GTVI/profile?p=GTVI'
    page = requests.get(url, headers={
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
    })
    soup = BeautifulSoup(page.text, 'html.parser')
    price = soup.find_all("tr", {"class": "C($primaryColor) BdB Bdc($seperatorColor) H(36px)"})
    print(soup.select_one("td > span").text)

^-The above single call works perfectly, but I can't get it to loop and print multiple times keeping the useragent of the browser masked. Here is my attempt at it (new to Python keep in mind) Haaalp :)

from bs4 import BeautifulSoup
import requests

url = ['https://finance.yahoo.com/quote/GTVI/profile?p=GTVI',
'https://finance.yahoo.com/quote/RAFA/profile?p=RAFA',
'https://finance.yahoo.com/quote/CYDX/profile?p=CYDX',
'https://finance.yahoo.com/quote/TTHG/profile?p=TTHG']
names = []
for link in url:
    w=1
    reqs2 = requests.get(link)
page = requests.get(url, headers={
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
})
soup = BeautifulSoup(page.text, 'html.parser')
for x in soup.find_all("tr", {"class": "C($primaryColor) BdB Bdc($seperatorColor) H(36px)"})
names.append(x.text)
print(names)(soup.select_one("td > span").text)

1 Answers1

0

Check your indents to get your code running and also your requests. Cause expected result from your question is not that clear, this is just a hint how to fix or get a result.

Example

from bs4 import BeautifulSoup
import requests

url = ['https://finance.yahoo.com/quote/GTVI/profile?p=GTVI',
'https://finance.yahoo.com/quote/RAFA/profile?p=RAFA',
'https://finance.yahoo.com/quote/CYDX/profile?p=CYDX',
'https://finance.yahoo.com/quote/TTHG/profile?p=TTHG']
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"}
names = []
for link in url:
    w=1
    page = requests.get(link, headers=headers)
    soup = BeautifulSoup(page.text, 'html.parser')
    for x in soup.find_all("tr", {"class": "C($primaryColor) BdB Bdc($seperatorColor) H(36px)"}):
        names.append(x.text)
    print(soup.select_one("td > span").text)
print(names)
HedgeHog
  • 22,146
  • 4
  • 14
  • 36
  • That works perfectly, but i've realized in some cases the field does not exists. How would you just set "n/a" on that line and skip to the next if it finds a blank. It errors and just cancels out. – MattHagens.com Oct 20 '22 at 14:55
  • Happy to help - This would be predestined for [asking a new question](https://stackoverflow.com/questions/ask) with exact this focus. In addition, may take a minute or two to read [ask]. Hint: simply use some `if-statement` to handle this cases. – HedgeHog Oct 20 '22 at 15:17