I have a multiyear daily dataframe which has different entities(Col1) present in different years (Col2). I am trying to filter the data frame such that it retains only those rows which have a value in each of the specified year. (I have created Col2 from the datetime index thinking it would make filtering process easier)
In other words, I am trying to design a conditional filter where a column value satisfying a condition in another column is retained.
This is my dataframe:
Date Col1 Col2 Col3 Col4
1/1/16 M 2016 34 a
2/2/16 M 2016 35 f
3/3/16 M 2016 45 d
1/1/17 S 2017 66 ff
2/2/17 S 2017 66 mn
1/1/18 T 2018 78 jh
2/2/18 T 2018 789 kk
1/1/17 M 2017 100 f
2/2/17 S 2017 2020 jj
2/2/18 M 2018 3005 m
Entities M,S and T record values in 2016-2019 but only M has values in all the three years.
Thus, resulting dataframe should look like this:
Date Col1 Col2 Col3 Col4
1/1/16 M 2016 34 a
2/2/16 M 2016 35 f
3/3/16 M 2016 45 d
1/1/17 M 2017 100 f
2/2/18 M 2018 3005 m
I am trying to work out the logic through following lines of code, but none work:
(1)
years = [2016,2017,2018]
for station in df_in['Col1']:
years = [2016,2017,2018]
if df_in['Col2'].isin(years).any():
df = df_in
else:
df = df_in.drop(df_in[df_in.Col1].index)
OR
(2)
df= df_in['Col1'].apply(lambda x: x in df_in['year'].isin(years))
OR
(3)
df = df_in.loc[(df_in['Col1']) & (df_in['Col2'].isin(years))]
What will be a good way of achieving this?
Any help will be much appreciated.