With Dash we can easily read CSV files and take first lines with head
, even in multiple partitions.
import dask.dataframe as dd
df = dd.read_csv('data.csv').head(n=100, npartitions=2)
But I would like to read last lines of my CSV file on multiple partitions, something like this :
import dask.dataframe as dd
df = dd.read_csv('data.csv').tail(n=100, npartitions=2)
Dask data.frame doesn't seem to support partition on tail
method.
In pandas
I could manage it with skiprows
, but this options seems not available in Dask
.