I have a dataset that goes from 2016 to 2020 with a 'Year' column. I would like to use 2016-2017 as train data and 2018-2020 as test data. Is there any easy method to perform this data split?
Asked
Active
Viewed 772 times
-1
-
There is no need to use sklearn, just filter using pandas: `df_train = df[df['year'].isin(2016,2017)]` and `df_test = df[~df['year'].isin(2016,2017)]` – FBruzzesi May 12 '20 at 08:13
1 Answers
0
You can use groupby function to group all the data in 2016 to 2017 as training data and group data from the year 2018-2020 as test data. Alternatively you can use the following code as well
df_train = df[df['year'].isin(2016,2017)] and df_test = df[df['year'].isin(2018,2019,2020)]

desertnaut
- 57,590
- 26
- 140
- 166

Jil Kothari
- 1
- 1