"ValueError: labels ['timestamp'] not contained in axis"
You don't have headers in the file, so the way you loaded it you got a df
where the column names are the first rows of the data. You tried to access colunm timestamp
which doesn't exist.
Your u.data
doesn't have headers in it
$head u.data
196 242 3 881250949
186 302 3 891717742
So working with column names isn't going to be possible unless add the headers. You can add the headers to the file u.data
, e.g. I opened it in a text editor and added the line a b c timestamp
at the top of it (this seems to be a tab-separated file, so be careful when added the header not to use spaces, else it breaks the format)
$head u.data
a b c timestamp
196 242 3 881250949
186 302 3 891717742
Now your code works and data.columns
returns
Index([u'a', u'b', u'c', u'timestamp'], dtype='object')
And the rest of the trace of your working code is now
(100000, 4) # the shape
['a', 'b', 'c', 'timestamp'] # the columns
a b c timestamp # the df
0 196 242 3 881250949
1 186 302 3 891717742
2 22 377 1 878887116
3 244 51 2 880606923
4 166 346 1 886397596
5 298 474 4 884182806
6 115 265 2 881171488
7 253 465 5 891628467
8 305 451 3 886324817
9 6 86 3 883603013
If you don't want to add headers
Or you can drop the column 'timestamp' using it's index (presumably 3), we can do this using df.ix
below it selects all rows, columns index 0 to index 2, thus dropping the column with index 3
data.ix[:, 0:2]