1

How to convert that html table to pandas dataframe?

https://www.google.com/finance/getprices?q=HINDALCO&i=60&p=15d&f=d,o,h,l,c,v

Example data:

enter image description here

Eugene Lisitsky
  • 12,113
  • 5
  • 38
  • 59
  • 1
    BTW, pandas comes with a data reader for [google finance](http://pandas-datareader.readthedocs.io/en/latest/remote_data.html#remote-data-google). Maybe it does the job? – greole Apr 11 '17 at 06:25

3 Answers3

2

You can use read_csv with parameters skiprows and names for new column names:

url = 'https://www.google.com/finance/getprices?q=HINDALCO&i=60&p=15d&f=d,o,h,l,c,v'
df = pd.read_csv(url, skiprows=[0,1,2,3,5,6]).rename(columns={'COLUMNS=DATE':'DATE'})

print (df.head())
          DATE   CLOSE    HIGH     LOW    OPEN  VOLUME
0  a1490154300  194.80  194.80  194.80  194.80    2600
1            1  193.55  194.70  193.00  194.15  339142
2            2  193.80  193.95  193.55  193.60  242687
3            3  194.20  194.40  193.80  193.90  119874
4            4  193.80  194.20  193.80  194.20  121355
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • do you any idea about datetime how can i convert that date[1]mean 1min date 2 mean 2 mins so on – Lakhan Pahuja Apr 11 '17 at 08:10
  • Ok, but `a1490154300` means what exactly? It is date or time? – jezrael Apr 11 '17 at 08:13
  • can i make array of time ? date 'a1490154300' mean 9:15 date'1'mean 9:16 so on – Lakhan Pahuja Apr 11 '17 at 15:26
  • Unfortunately I am offline now, only on phone, so fastest is create new question. Dont forget add sample data, desired output and what you try. Btw, in `a1490154300` is uncoded also date like `2017-04-11`? `a149015` means 9:15? The best is ecplain it too. Thanks. – jezrael Apr 11 '17 at 16:00
0

If you are not particular about getting your data only from Google, this will work with Yahoo :

import pandas as pd
import pandas_datareader.data as web
from datetime import datetime

start = datetime(2014, 6, 2)
end = datetime(2014, 9, 5)

hindalco = web.DataReader('HINDALCO.NS', 'yahoo', start, end)

In [15]: hindalco.head(5)
Out[15]: 
             Open    High     Low   Close    Volume  Adj Close
Date                                                          
2014-06-02  147.3  151.15  146.35  150.05  13844600   146.2012
2014-06-03  150.7  155.70  149.35  155.20  23276100   151.2191
2014-06-04  156.0  161.75  155.20  160.70  15948200   156.5780
2014-06-05  160.6  171.00  159.95  169.85  20296900   165.4934
2014-06-06  172.0  172.10  165.30  169.25  13769100   164.9087

You will need to install pandas-datareader.

sudo -H pip install pandas-datareader (ubuntu)
pip install pandas-datareader (windows)
dmdip
  • 1,665
  • 14
  • 15
  • I did try using 'HINDALCO' as the company code and google instead of yahoo, but this did not work. However, US stocks like Apple worked with Google : apple = web.DataReader('AAPL', 'google', start, end) – dmdip Apr 11 '17 at 06:36
  • i want 1mins data not daily – Lakhan Pahuja Apr 11 '17 at 07:34
0

@jezrael] has answered your question for one minute data for Hindalco. Here's more or less the same answer, tweaked a bit :

import pandas as pd
period = 60 #one minute
days = 2 # 2 days of data
ticker = 'HINDALCO'
url = 'http://www.google.com/finance/getprices?i={}&p={}d&f=d,o,h,l,c,v&df=cpct&q={}'.format(period, days, ticker)
cols = ['minute', 'open', 'high', 'low', 'close', 'volume']
df = pd.read_csv(url, skiprows=8, header=None, names=cols)
df.head()

Output:

  minute    open    high     low   close  volume
0      1  194.35  194.35  193.80  193.85   25785
1      2  194.15  194.40  194.00  194.35   64580
2      3  193.95  194.25  193.85  194.15   42980
3      4  193.80  193.95  193.75  193.90   33936
4      5  193.90  193.90  193.60  193.80   57088
dmdip
  • 1,665
  • 14
  • 15
  • Yes, you can. You have to use the field value which appears as **a1490154300**. Basically 1490154300 is the start time of the data, measured as the number of seconds from January 1, 1970 5.30 AM IST. (Actually 0000 UTC). To that field value add the number of minutes which appear in that column. – dmdip Apr 11 '17 at 17:21
  • Correction to the code given earlier. df = pd.read_csv(url, skiprows=7, header=None, names=cols) – dmdip Apr 11 '17 at 17:33