I am trying to populate a new field containing a struct of all of the other fields from another lazyframe based on a predicate.
While the examples are in python, I am open to answers in python or rust.
companies = pl.DataFrame({
"id": [1],
"name": ["google"],
"industry": [1001]
}).lazy()
industries = pl.DataFrame({
"id": [1001],
"name": ["tech"],
"sectors": [[10011, 10012]]
}).lazy()
expected = pl.DataFrame({
"id": [1],
"name": ["polars"],
"industry": [{
"name": "tech",
"sectors": [[10011, 10012]]
}]
})
I can do this naively through join -> select -> to_struct -> lit, but that is not ideal as I have to perform a collect on the lazyframe to get my desired results.
right_columns = industries.schema.keys()
industry_matches = companies.join(
industries,
left_on="industry",
right_on="id",
how="inner"
).collect().select(right_columns).drop(["id"]).to_struct("industry")
df = companies.with_column(pl.lit(industry_matches))
df.collect()
Ideally, i'd like to do something like a join within an expression like so.
companies.select([
pl.col("*"),
pl.col("industry").join(industries, on="id").exclude(["id"]),
])
but I am open to any alternatives that do not require a collect