0

Like many others I have been looking for an alternative source of stock prices now that the Yahoo and Google APIs are defunct. I decided to take a try at web scraping the Yahoo site from which historical prices are still available. I managed to put together the following code which almost does what I need:

import urllib.request as web
import bs4 as bs

def yahooPrice(tkr):
    tkr=tkr.upper()
    url='https://finance.yahoo.com/quote/'+tkr+'/history?p='+tkr

    sauce=web.urlopen(url)
    soup=bs.BeautifulSoup(sauce,'lxml')
    table=soup.find('table')
    table_rows=table.find_all('tr')

    allrows=[]
    for tr in table_rows:
        td=tr.find_all('td')
        row=[i.text for i in td]
        if len(row)==7:
            allrows.append(row)

    vixdf= pd.DataFrame(allrows).iloc[0:-1]
    vixdf.columns=['Date','Open','High','Low','Close','Aclose','Volume']
    vixdf.set_index('Date',inplace=True)

    return vixdf

which produces a dataframe with the information I want. Unfortunately, even though the actual web page shows a full year's worth of prices, my routine only returns 100 records (including dividend records). Any idea how I can get more?

Eugene Brown
  • 4,032
  • 6
  • 33
  • 47
dborger
  • 89
  • 2
  • 5

3 Answers3

0

The Yahoo Finance API was depreciated in May '17, I believe. Now, there are to many options for downloading time series data for free, at least that I know of. Nevertheless, there is always some kind of alternative. Check out the URL below to find a tool to download historical price.

http://investexcel.net/multiple-stock-quote-downloader-for-excel/

enter image description here

See this too.

https://blog.quandl.com/api-for-stock-data

ASH
  • 20,759
  • 19
  • 87
  • 200
0

I don't have the exact solution to your question but I have a workaround (I had the same problem and hence used this approach)....basically, you can use Bday() method - 'import pandas.tseries.offset' and look for x number of businessDays for collecting the data. In my case, i ran the loop thrice to get 300 businessDays data - knowing that 100 was maximum I was getting by default.

Basically, you run the loop thrice and set the Bday() method such that the iteration on first time grabs 100 days data from now, then the next 100 days (200 days from now) and finally the last 100 days (300 days from now). The whole point of using this is because at any given point, one can only scrape 100 days data. So basically, even if you loop through 300 days in one go, you may not get 300 days data - your original problem (possibly yahoo limits amount of data extracted in one go). I have my code here : https://github.com/ee07kkr/stock_forex_analysis/tree/dataGathering

Note, the csv files for some reason are not working with /t delimiter in my case...but basically u can use the data frame. One more issue I currently have is 'Volume' is a string instead of float....the way to get around is :

apple = pd.DataFrame.from_csv('AAPL.csv',sep ='\t') apple['Volume'] = apple['Volume'].str.replace(',','').astype(float)

Karan
  • 11
  • 4
  • Thanks for your response. This sounds like it would work for me, but I'm afraid I'm missing something. I don't understand the purpose of the pandas.tseries.offset if you already know that you want 300 days. Also wouldn't calling three times just get 3 copies of the same 100 days? Could you maybe post a short example to clarify? Thanks – dborger May 15 '18 at 22:44
0

First - Run the code below to get your 100 days. Then - Use SQL to insert the data into a small db (Sqlite3 is pretty easy to use with python). Finally - Amend code below to then get daily prices which you can add to grow your database.

from pandas import DataFrame
import bs4
import requests

def function():
    url = 'https://uk.finance.yahoo.com/quote/VOD.L/history?p=VOD.L'
    response = requests.get(url)
    soup=bs4.BeautifulSoup(response.text, 'html.parser')
    headers=soup.find_all('th')
    rows=soup.find_all('tr')
    ts=[[td.getText() for td in rows[i].find_all('td')] for i in range (len(rows))]
    date=[]
    days=(100)
    while days > 0:
        for i in ts:
            data.append (i[:-6])
        now=data[num]
        now=DataFrame(now)
        now=now[0]

        now=str(now[0])
        print now, item
        num=num-1
python_starter
  • 1,509
  • 1
  • 11
  • 18