Slicing pandas raw dataframe (prior to re-organizing the data)

Question

This is my very first post but I'll do my best to make it relevant.

I have a dataframe of stock prices freshly imported with the DataReader, from Morningstar. It looks like this :

print df.head()
                Close     High     Low    Open    Volume Symbol
Symbol Date                                                        
AAPL   2018-03-01  175.00  179.775  172.66  178.54  48801970   AAPL
       2018-03-02  176.21  176.300  172.45  172.80  38453950   AAPL
       2018-03-05  176.82  177.740  174.52  175.21  28401366   AAPL
       2018-03-06  176.67  178.250  176.13  177.91  23788506   AAPL
       2018-03-07  175.03  175.850  174.27  174.94  31703462   AAPL

I want to refer to specific cells in the dataframe, especially the values in the last row for a given stock. There are 255 rows.

Please note that the dataframe is a concatenation of multiple DataReader fetches. I made it from code found on StackOverflow with slight updates and changes :

rawdata = [] # initializing empty dataframe
for ticker in tickers: 
    fetched = web.DataReader(ticker, "morningstar", start='3/1/2018', end='4/15/2018') # bloody month/day/year
    fetched['Symbol'] = ticker # add a symbol column
    rawdata.append(fetched)

stocks = pd.concat(fetched) # concatenate all the dfs

Now

print df[255:]

returns the last row with column names, and

print df[255:].values

returns the values of the last row. But

print df[-1]

returns an error. I will need to refer to the last row after updating the dataframe, without knowing whether there are now x or y rows. Why can't I do df[-1] ?

I've looked around and found techniques with "iloc" notably, but I'm trying to keep this very simple for the moment.

I've also looked for questions about slicing. But

print df[255:['Close']]

returns the error "unhashable type" - although there already is a column named 'Close'.

Is it because my dataframe is not properly indexed ? Or because it is not a csv yet ? I know how to work on indexes, and also how to write to csv. And I will definitely have to organize the data in a better way at some stage. But I don't understand why I cannot call the last row or slice for a specific cell with the current format of my data.

Thanks for your kind attention

ALollz · Answer 1 · 2018-04-16T13:41:36.813

2

You need to be a bit careful when slice DataFrames with []

When you provide only a single argument, it looks to slice the DataFrame by columns. When you write df[-1] you're going to get KeyError: -1 because your df doesn't have any column labeled -1.

If you want to slice the last row, you need to either add a semi-colon with [] or if you want to be super safe, use .iloc.

Hopefully this illustrates this a bit more. I've included a column labeled -1 just to show you what df[-1] will actually do.

import pandas as pd
df = pd.DataFrame({'value': [-2,-1,0,1,2],
                  'name': ['a', 'b', 'c', 'd', 'e'],
                  -1: [1,2,3,4,5]})

#   value name  -1
#0     -2    a   1
#1     -1    b   2
#2      0    c   3
#3      1    d   4
#4      2    e   5

df[-1]
#0    1
#1    2
#2    3
#3    4
#4    5
#Name: -1, dtype: int64

df[-1:] # or df.iloc[-1:]
#   value name  -1
#4      2    e   5

edited Apr 16 '18 at 13:41

answered Apr 16 '18 at 13:32

ALollz

57,915
7
66
89

Thanks, this helps me with the syntax of slicing dataframes. Actually df[-1:] calls for the last row, in my case. So now I've tried df[-1:]['Close'] and it works. I don't understand why just [-1] doesn't work - it works for strings and lists, why not for dataframes ? But I'll get used to it ! So this had nothing to do with the format of my data, right ? I could keep playing with it without further indexing and without writing it to a csv, correct ? – citizen007 Apr 16 '18 at 13:43
@citizen007 right, it's not about the format of your data. If you look at [Basics](https://pandas.pydata.org/pandas-docs/stable/indexing.html#basics), you can see that the Pandas indexing default is that `[]` slices a `DataFrame` by columns, and looks to return the series. `DataFrame -- frame[colname] -- Series corresponding to colname`. If you keep reading a bit further down, you can see that as long as you provide a RANGE, that everything works as expected, because "With Series, the syntax works exactly as with an ndarray, returning a slice of the values and the corresponding labels:" – ALollz Apr 16 '18 at 13:51
Thanks for the explanation and the link to the documentation ! All good now ! – citizen007 Apr 16 '18 at 16:47

Slicing pandas raw dataframe (prior to re-organizing the data)

1 Answers1