I have a dataframe
like below:
data = [
[123456, "2017", 150.235],
[123456, "2017", 160],
[123456, "2017", 135],
[123456, "2017", 135],
[123456, "2017", 135],
[123456, "2018", 202.5],
[123456, "2019", 168.526],
[123456, "2020", 175.559],
[123456, "2020", 176],
[123456, "2021", 206.667],
[789101, "2017", 228.9],
[789101, "2018", 208],
[789101, "2018", 208],
[789101, "2018", 208],
]
df = pd.DataFrame(
data,
columns=[
"ID",
"year",
"value",
],
)
df
In this dataframe
I have an ID
column and 2+ years
. The year
columns can contain 1 or more value
columns.
I would like to filter this dataframe
so that all of the earliest year
rows (even if there are duplicate values
) and all of the latest year
rows (again, even if there are duplicate values
I want them).
My desired output is:
I found another SO question that was similar:
g = df.groupby("ID")
(pd.concat([g.head(1), g.tail(1)])
.drop_duplicates()
.sort_values('ID')
.reset_index(drop=True))
but it only first to the first value
within the first year
and I want all of the values
.
Can anyone please advise?!
Thank you !!