I have a large dataframe (+200 million rows) that is in the following format
DeviceID Date_Time
50135487 2018-03-01 00:00:44
50135487 2018-03-02 01:01:21
50135487 2018-03-01 02:01:58
50135484 2018-03-01 02:01:58
50135484 2018-03-01 02:50:13
50090879 2018-03-01 02:50:13
50090879 2018-03-01 02:50:13
50090860 2018-03-01 02:50:13
50090860 2018-03-01 02:50:13
Since the data frame has about 7700 unique 'DeviceID' values, I want to split the large data frame into 8 smaller dataframes so that I can run the analysis on them quicker.
I've tried using numpy
:
import numpy as np
np.array_split(df, 3)
but it produced dataframes where the a specific DeviceID
is found in multiple dataframes.
I'm imagining that the solution would include an if
statement combined with groupby
, but I'm not sure how to go about it.