I have a single, large, file. It has 40,955,924 lines and is >13GB. I need to be able to separate this file out into individual files based on a single field, if I were using a pd.DataFrame
I would use this:
for k, v in df.groupby(['id']):
v.to_csv(k, sep='\t', header=True, index=False)
However, I get the error KeyError: 'Column not found: 0'
there is a solution to this specific error on Iterate over GroupBy object in dask, but this requires using pandas to store a copy of the dataframe, which I cannot do. Any help on splitting this file up would be greatly appreciated.