-1

I have a dataset that goes from 2016 to 2020 with a 'Year' column. I would like to use 2016-2017 as train data and 2018-2020 as test data. Is there any easy method to perform this data split?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Matthias Gallagher
  • 475
  • 1
  • 7
  • 20
  • There is no need to use sklearn, just filter using pandas: `df_train = df[df['year'].isin(2016,2017)]` and `df_test = df[~df['year'].isin(2016,2017)]` – FBruzzesi May 12 '20 at 08:13

1 Answers1

0

You can use groupby function to group all the data in 2016 to 2017 as training data and group data from the year 2018-2020 as test data. Alternatively you can use the following code as well

df_train = df[df['year'].isin(2016,2017)] and df_test = df[df['year'].isin(2018,2019,2020)]
desertnaut
  • 57,590
  • 26
  • 140
  • 166