1

I'm trying to get the value of 'n' in the last row of a dask dataframe.

If I understand correctly, positional indexing isn't an option. I don't know the index of the last row. I thought tail() would be the solution, but it returns and empty dataframe.

print( df.compute() ) # df has 47 rows

returns

       file            str          n 
11027  /Users/...      XXX...       901  
11028  /Users/...      XXX...       902  
...                                   
11099  /Users/...      XXX...       946
11100  /Users/...      XXX...       947

then i do

tail = df.tail( n=10, compute=True )
print(tail)

which takes A MINUTE AND FIFTEEN SECONDS which is unacceptably slow since I need to do several thousand of these and returns

Empty DataFrame
Columns: [file, str, n]
Index: []

What am I missing here?

Note, I found a solution for head() returning empty but the solution doesn't apply to tail(). dask dataframe head() returns empty df

Amanda
  • 202
  • 1
  • 3
  • 8

2 Answers2

0

print with print (df.tail(10))

zealous
  • 7,336
  • 4
  • 16
  • 36
0

Visit https://tutorial.dask.org/04_dataframe.html and find the chapter titled What just happened?. It contains a decription what can go wrong and why.

It contains also a recipe that reading a DataFrame using read_csv you should pass also dtype parameter, specifying column types.

Try this approach.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41