I have two dataframes that I'd like to join if one column's value is contained in the other column. The dataframes look like this:
df1 = pl.DataFrame({"col1": [1, 2, 3], "col2": ["x1, x2, x3", "x2, x3", "x3"]})
df2 = pl.DataFrame({"col3": [4, 5, 6], "col4": ["x1", "x2", "x3"]})
I tried to do:
model_data = df1.join(df2, on="col2")
Which does not produce the desired result. What I'd like to see is something like this:
col1 col2 col3 col4
1 "x1, x2, x3" 4 "x1"
1 "x1, x2, x3" 5 "x2"
1 "x1, x2, x3" 6 "x3"
2 "x2, x3" 5 "x2"
2 "x2, x3" 6 "x3"
3 "x3" 6 "x3"
It's a question of how you do the join when one value is contained by another value. I could not find good examples of this in the docs.