I have two dataframes that look like those:
df1 = pl.DataFrame(
{
"Name": ["A", "B", "C", "D"],
"Year": [2001, 2003, 2003, 2004]
}
)
df2 = pl.DataFrame(
{
"Name": ["A", "B", "C", "D"],
"2001": [111, 112, 113, 114],
"2002": [221, 222, 223, 224],
"2003": [331, 332, 333, 334],
"2004": [441, 442, 443, 444]
}
)
I'd like to sum each year column of the second df (df2), taking in account only names whose corresponding year in df1 is the same year or later. Desired output:
┌──────┬──────┐
│ Year ┆ Sum │
╞══════╪══════╡
│ 2001 ┆ 111 │
│ 2002 ┆ 221 │
│ 2003 ┆ 996 │ (= 331 + 332 + 333)
│ 2004 ┆ 1770 │ (= 441 + 442 + 443 + 444)
└──────┴──────┘
I'm new to Polars (coming from Pandas), and I'm not sure how to do this. Any help will be appreciated.