0

Goal:

The goal is to generate a robot in replit which will iteratively scrape yahoo pages like this amazon page, and track the dynamic 'Volume' datapoint for abnormally large changes. I'm currently trying to be able to reliably pull this exact datapoint down, and I have been using the yahoo_fin API to do so. I have also considered using bs4, but I'm not sure if it is possible to use BS4 to extract dynamic data. (I'd greatly appreciate it if you happen to know the answer to this: can bs4 extract dynamic data?)

Problem:

The script seems to work, but it does not stay online due to what appears to be an error in yahoo_fin. Usually within around 5 minutes of turning the bot on, it throws the following error:

  File "/home/runner/goofy/scrape.py", line 13, in fetchCurrentVolume
    table = si.get_quote_table(ticker)
  File "/opt/virtualenvs/python3/lib/python3.8/site-packages/yahoo_fin/stock_info.py", line 293, in get_quote_table
    tables = pd.read_html(requests.get(site, headers=headers).text)
  File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/io/html.py", line 1098, in read_html
    return _parse(
  File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/io/html.py", line 926, in _parse
    raise retained
  File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/io/html.py", line 906, in _parse
    tables = p.parse_tables()
  File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/io/html.py", line 222, in parse_tables
    tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
  File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/io/html.py", line 552, in _parse_tables
    raise ValueError("No tables found")
ValueError: No tables found

However, this usually happens after a number of tables have already been found.

Here is the fetchCurrentVolume function:

import yahoo_fin.stock_info as si

def fetchCurrentVolume(ticker):

  table = si.get_quote_table(ticker)
  currentVolume = table['Volume']


  return currentVolume

and the API documentation is found above under Goal. Whenever this error message is displayed, the bot exits a @tasks.loop , and the robot goes offline. If you know of a way to fix the current use of yahoo_fin, OR any other way to obtain the dynamic data found in this xpath: '//div[@id="quote-summary"]/div/table/tbody/tr' , then you will have pulled me out of a 3 weeks long debacle with this issue! Thank you.

Seth Bowers
  • 31
  • 1
  • 6

2 Answers2

0

If you are able to retrieve some data then it cuts out, it is probably due to a rate limit. Try adding a sleep of a few seconds between each one.

see here for how to use sleep

Michael P
  • 11
  • 4
  • I wondered this as well and should have elaborated. I have used sleep up to 40 seconds, and the error still eventually occurs, and I believe this to be far above the yahoo finance query threshold. – Seth Bowers Aug 18 '21 at 22:50
  • I saw that you are asking about different ways to get the data including bs4. I used bs4 in the past for this but I now use [yfinance](https://pypi.org/project/yfinance/) which seems to work pretty well. – Michael P Aug 19 '21 at 11:45
  • I love yfinance, but volume is one of the only parameters that i could not locate – Seth Bowers Aug 19 '21 at 15:08
0

Maybe the web server bonks out when the tables are being re-written every so often. Or something like that.

If you use a try/except waited a few seconds and then tried again before bailing out to a failure maybe that would work if it is just a hicup once in a while?

import yahoo_fin.stock_info as si
import time

def fetchCurrentVolume(ticker):
    try:
        table = si.get_quote_table(ticker)
        currentVolume = table['Volume']
    except:
        # hopefully this was just a hicup and it will be back up in 5 seconds
        time.sleep(5)
        table = si.get_quote_table(ticker)
        currentVolume = table['Volume']
    return currentVolume
Brian Z
  • 343
  • 1
  • 8