5

Anyone can tell me how i should select one column with 'loc' in a dataframe using dask?

As a side note, when i am loading the dataframe using dd.read_csv with header equals to "None", the column name is starting from zero to 131094. I am about to select the last column with column name as 131094, and i get the error.

code:

> import dask.dataframe as dd
> df = dd.read_csv('filename.csv', header=None)
> y = df.loc['131094']

error:

File "/usr/local/dask-2018-08-22/lib/python2.7/site-packages/dask-0.5.0-py2.7.egg/dask/dataframe/core.py", line 180, in _loc "Can not use loc on DataFrame without known divisions") ValueError: Can not use loc on DataFrame without known divisions

Based on this guideline http://dask.pydata.org/en/latest/dataframe-indexing.html#positional-indexing, my code should work right but don't know what causes the problem.

user8034918
  • 441
  • 1
  • 9
  • 20

2 Answers2

1

If you have a named column, then use: df.loc[:,'col_name'] But if you have a positional column, like in your example where you want the last column, then use the answer by @user1717828.

0

I tried this on a dummy csv and it worked. I can't help you for sure without seeing the file giving you problems. That said, you might be picking rows, not columns.

Instead, try this.

import dask.dataframe as dd
df = dd.read_csv('filename.csv', header=None)
y = df[df.columns[-1]]
user1717828
  • 7,122
  • 8
  • 34
  • 59
  • Well i am getting this error: File "/usr/local/dask-2018-08-22/lib/python2.7/site-packages/dask-0.5.0-py2.7.egg/dask/dataframe/core.py", line 452, in __getitem__ raise NotImplementedError() NotImplementedError – user8034918 Aug 26 '18 at 03:13