0
print(
    (
        df1.lazy()
        .with_context(df2.lazy())
        .select(
            pl.col("df1_date")
            .apply(lambda s: pl.col("df2_date").filter(pl.col("df2_date") >= s).first())
            .alias("release_date")
        )
    ).collect()
)

Instead of getting actual data, I get a df of query plans. Is there any other way to solve my problem, Thx!!

In pandas, I can get what I want by using:

df1["release_date"] = df1.index.map(
            lambda x: df2[df2.index < x].index[-1]
        )

Edit:

Pls try code below and you will see polars only return query plans for this. While pandas gives the right data I want.

import polars as pl

df1 = pl.DataFrame(
    {
        "df1_date": [20221011, 20221012, 20221013, 20221014, 20221016],
        "df1_col1": ["foo", "bar", "foo", "bar", "foo"],
    }
)

df2 = pl.DataFrame(
    {
        "df2_date": [20221012, 20221015, 20221018],
        "df2_col1": ["1", "2", "3"],
    }
)

print(
    (
        df1.lazy()
        .with_context(df2.lazy())
        .select(
            pl.col("df1_date")
            .apply(lambda s: pl.col("df2_date").filter(pl.col("df2_date") <= s).last())
            .alias("release_date")
        )
    ).collect()
)

df1 = df1.to_pandas().set_index("df1_date")
df2 = df2.to_pandas().set_index("df2_date")

df1["release_date"] = df1.index.map(
    lambda x: df2[df2.index <= x].index[-1] if len(df2[df2.index <= x]) > 0 else 0
)
print(df1)
Arzx
  • 3
  • 2
  • Can you add some input data and show what your desired result is? It is hard to understand what you are trying to achieve, or replicate your behavior. – ritchie46 Nov 17 '22 at 18:56
  • @ritchie46 I add a code example. Im not sure if its the right way to do it. Thank you! – Arzx Nov 17 '22 at 19:24
  • @ritchie46 What I want is actually using the release_date to left join the two dataframes. The reason why I add release_data is because these two dataframes may have different dates in their coressponding data column. So after that I can forward fill or backfill some data. Thx. – Arzx Nov 17 '22 at 19:30

1 Answers1

1

It looks like you're trying to do an asof join. In other words a join where you take the last value that matched rather than exact matches.

You can do

df1 = (df1.lazy().join_asof(df2.lazy(), left_on='df1_date', right_on='df2_date')) \
           .select(['df1_date', 'df1_col1',
                    pl.col('df2_date').fill_null(0).alias('release_date')]).collect()

The first difference is that in polars you don't assign new columns, you assign the whole df so it's always just the name of the df on the left side of the equals. The join_asof replaces your index/map/lambda thing. Then the last thing is to just replace the null value with 0 with fill_null and then rename the column. There was a bug in an old version of polars preventing the collect from working at the end. That is fixed in at least 0.15.1 (maybe an earlier version too but I'm just checking in with that version)

Dean MacGregor
  • 11,847
  • 9
  • 34
  • 72