0

Getting Panicked Exception while collecting on a lazy data frames join.

The Exception I get:

thread '<unnamed>' panicked at 'not implemented', D:\a\polars\polars\crates\polars-core\src\series\series_trait.rs:60:13 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace Traceback (most recent call last): . . File "C:\Users\smruti\AppData\Roaming\Python\Python310\site-packages\polars\utils\deprecation.py", line 93, in wrapper return function(*args, **kwargs) File "C:\Users\smruti\AppData\Roaming\Python\Python310\site-packages\polars\lazyframe\frame.py", line 1561, in collect return wrap_df(ldf.collect()) pyo3_runtime.PanicException: not implemented

join_on= ["INT_COL","STRING_COL"]
col_list = ["INT_COL","STRING_COL", "TINYINT_COL", "DECIMAL_COL" ...]

other_columns = [col for col in col_list if col not in join_on]

src_0_lazy_df and src_1_lazy_df are 2 lazy dataframes.

src_0_lazy_df.collect()

┌───────┬────────────┬────────────┬─────────┬───┬────────────┬────────────┬────────────┬───────────┐
│ rtype ┆ TINYINT_CO ┆ SMALLINT_C ┆ INT_COL ┆ … ┆ DECIMAL_CO ┆ STRING_COL ┆ DATE_COL   ┆ DATETIME_ │
│ ---   ┆ L          ┆ OL         ┆ ---     ┆   ┆ L          ┆ ---        ┆ ---        ┆ COL       │
│ str   ┆ ---        ┆ ---        ┆ i32     ┆   ┆ ---        ┆ str        ┆ datetime[m ┆ ---       │
│       ┆ bool       ┆ i16        ┆         ┆   ┆ f64        ┆            ┆ s]         ┆ datetime[ │
│       ┆            ┆            ┆         ┆   ┆            ┆            ┆            ┆ ms]       │
╞═══════╪════════════╪════════════╪═════════╪═══╪════════════╪════════════╪════════════╪═══════════╡
│ D     ┆ false      ┆ 19706      ┆ 123     ┆ … ┆ 164.98     ┆ Smruti     ┆ 1994-04-27 ┆ 1974-08-0 │
│       ┆            ┆            ┆         ┆   ┆            ┆            ┆ 00:00:00   ┆ 8         │
│       ┆            ┆            ┆         ┆   ┆            ┆            ┆            ┆ 16:28:51  │
│ D     ┆ true       ┆ 23757      ┆ 123     ┆ … ┆ 164.98     ┆ Chetan     ┆ 2019-01-25 ┆ 2001-03-2 │
│       ┆            ┆            ┆         ┆   ┆            ┆            ┆ 00:00:00   ┆ 4         │
│       ┆            ┆            ┆         ┆   ┆            ┆            ┆            ┆ 22:00:13  │
│ D     ┆ true       ┆ -29931     ┆ 345     ┆ … ┆ 173.88     ┆ Jagan      ┆ 1972-05-14 ┆ 2019-01-1 │
│       ┆            ┆            ┆         ┆   ┆            ┆            ┆ 00:00:00   ┆ 5         |
└───────┴────────────┴────────────┴─────────┴───┴────────────┴────────────┴────────────┴───────────┘

src_1_lazy_df.collect()

┌───────┬────────────┬────────────┬─────────┬───┬────────────┬────────────┬────────────┬───────────┐
│ rtype ┆ TINYINT_CO ┆ SMALLINT_C ┆ INT_COL ┆ … ┆ DECIMAL_CO ┆ STRING_COL ┆ DATE_COL   ┆ DATETIME_ │
│ ---   ┆ L          ┆ OL         ┆ ---     ┆   ┆ L          ┆ ---        ┆ ---        ┆ COL       │
│ str   ┆ ---        ┆ ---        ┆ i32     ┆   ┆ ---        ┆ str        ┆ datetime[m ┆ ---       │
│       ┆ bool       ┆ i16        ┆         ┆   ┆ f64        ┆            ┆ s]         ┆ datetime[ │
│       ┆            ┆            ┆         ┆   ┆            ┆            ┆            ┆ ms]       │
╞═══════╪════════════╪════════════╪═════════╪═══╪════════════╪════════════╪════════════╪═══════════╡
│ D     ┆ false      ┆ 19706      ┆ 123     ┆ … ┆ 164.98     ┆ Smruti     ┆ 1984-04-27 ┆ 1974-08-0 │
│       ┆            ┆            ┆         ┆   ┆            ┆            ┆ 00:00:00   ┆ 8         │
│       ┆            ┆            ┆         ┆   ┆            ┆            ┆            ┆ 16:28:51  │
│ D     ┆ true       ┆ 23757      ┆ 123     ┆ … ┆ 164.98     ┆ Chetan     ┆ 2019-01-25 ┆ 2001-03-2 │
│       ┆            ┆            ┆         ┆   ┆            ┆            ┆ 00:00:00   ┆ 4         │
│       ┆            ┆            ┆         ┆   ┆            ┆            ┆            ┆ 22:00:13  │
│ D     ┆ true       ┆ -29931     ┆ 345     ┆ … ┆ 173.88     ┆ Jagan      ┆ 1982-05-14 ┆ 2019-01-1 │
│       ┆            ┆            ┆         ┆   ┆            ┆            ┆ 00:00:00   ┆ 5         |
└───────┴────────────┴────────────┴─────────┴───┴────────────┴────────────┴────────────┴───────────┘

I divide the entire dataframe into 2 category 1 Struct of Join on columns and second one is remaining other columns


src0_struct_df= src_0_lazy_df.select(pl.struct(join_on).alias("pks"),
pl.struct(other_columns).alias("Day0"))

src0_struct_df= src_1_lazy_df.select(pl.struct(join_on).alias("pks"),
pl.struct(other_columns).alias("Day1"))

I use an outer join and pint the plan which is fine. But the moment I run collect I hit the error.


etl_activity = src0_struct_df.join(src1_struct_df, on="pks", how="outer")

print(etl_activity) # Works fine

print(etl_activity.collect())

.
.
.
File "C:\\Users\\smruti\\AppData\\Roaming\\Python\\Python310\\site-packages\\polars\\utils\\deprecation.py", line 93, in wrapper
return function(\*args, \*\*kwargs)
File "C:\\Users\\smruti\\AppData\\Roaming\\Python\\Python310\\site-packages\\polars\\lazyframe\\frame.py", line 1561, in collect
return wrap_df(ldf.collect())
pyo3_runtime.PanicException: not implemented
  • `Not implemented` seems like a straightforward error message to me... it's impossible to do. You can just do `on = join_on`, look at the function signature of [`join`](https://pola-rs.github.io/polars/py-polars/html/reference/lazyframe/api/polars.LazyFrame.join.html#polars.LazyFrame.join). – Wayoshi Aug 24 '23 at 05:09
  • I have dataset that have 100s columns. I thought dividing the columns to 2 partitions one one joining keys and other on remaining columns will enable a faster lookup on what data got updated , deleted etc.. – Smruti Prakash Mohanty Aug 24 '23 at 06:15
  • @SmrutiPrakashMohanty Polars already has an optimization engine built in. You shouldn't need to worry about managing it. – BallpointBen Aug 25 '23 at 21:12

0 Answers0