Getting Panicked Exception while collecting on a lazy data frames join.
The Exception I get:
thread '<unnamed>' panicked at 'not implemented', D:\a\polars\polars\crates\polars-core\src\series\series_trait.rs:60:13 note: run with
RUST_BACKTRACE=1 environment variable to display a backtrace Traceback (most recent call last): . . File "C:\Users\smruti\AppData\Roaming\Python\Python310\site-packages\polars\utils\deprecation.py", line 93, in wrapper return function(*args, **kwargs) File "C:\Users\smruti\AppData\Roaming\Python\Python310\site-packages\polars\lazyframe\frame.py", line 1561, in collect return wrap_df(ldf.collect()) pyo3_runtime.PanicException: not implemented
join_on= ["INT_COL","STRING_COL"]
col_list = ["INT_COL","STRING_COL", "TINYINT_COL", "DECIMAL_COL" ...]
other_columns = [col for col in col_list if col not in join_on]
src_0_lazy_df and src_1_lazy_df are 2 lazy dataframes.
src_0_lazy_df.collect()
┌───────┬────────────┬────────────┬─────────┬───┬────────────┬────────────┬────────────┬───────────┐
│ rtype ┆ TINYINT_CO ┆ SMALLINT_C ┆ INT_COL ┆ … ┆ DECIMAL_CO ┆ STRING_COL ┆ DATE_COL ┆ DATETIME_ │
│ --- ┆ L ┆ OL ┆ --- ┆ ┆ L ┆ --- ┆ --- ┆ COL │
│ str ┆ --- ┆ --- ┆ i32 ┆ ┆ --- ┆ str ┆ datetime[m ┆ --- │
│ ┆ bool ┆ i16 ┆ ┆ ┆ f64 ┆ ┆ s] ┆ datetime[ │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ms] │
╞═══════╪════════════╪════════════╪═════════╪═══╪════════════╪════════════╪════════════╪═══════════╡
│ D ┆ false ┆ 19706 ┆ 123 ┆ … ┆ 164.98 ┆ Smruti ┆ 1994-04-27 ┆ 1974-08-0 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 00:00:00 ┆ 8 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 16:28:51 │
│ D ┆ true ┆ 23757 ┆ 123 ┆ … ┆ 164.98 ┆ Chetan ┆ 2019-01-25 ┆ 2001-03-2 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 00:00:00 ┆ 4 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 22:00:13 │
│ D ┆ true ┆ -29931 ┆ 345 ┆ … ┆ 173.88 ┆ Jagan ┆ 1972-05-14 ┆ 2019-01-1 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 00:00:00 ┆ 5 |
└───────┴────────────┴────────────┴─────────┴───┴────────────┴────────────┴────────────┴───────────┘
src_1_lazy_df.collect()
┌───────┬────────────┬────────────┬─────────┬───┬────────────┬────────────┬────────────┬───────────┐
│ rtype ┆ TINYINT_CO ┆ SMALLINT_C ┆ INT_COL ┆ … ┆ DECIMAL_CO ┆ STRING_COL ┆ DATE_COL ┆ DATETIME_ │
│ --- ┆ L ┆ OL ┆ --- ┆ ┆ L ┆ --- ┆ --- ┆ COL │
│ str ┆ --- ┆ --- ┆ i32 ┆ ┆ --- ┆ str ┆ datetime[m ┆ --- │
│ ┆ bool ┆ i16 ┆ ┆ ┆ f64 ┆ ┆ s] ┆ datetime[ │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ms] │
╞═══════╪════════════╪════════════╪═════════╪═══╪════════════╪════════════╪════════════╪═══════════╡
│ D ┆ false ┆ 19706 ┆ 123 ┆ … ┆ 164.98 ┆ Smruti ┆ 1984-04-27 ┆ 1974-08-0 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 00:00:00 ┆ 8 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 16:28:51 │
│ D ┆ true ┆ 23757 ┆ 123 ┆ … ┆ 164.98 ┆ Chetan ┆ 2019-01-25 ┆ 2001-03-2 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 00:00:00 ┆ 4 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 22:00:13 │
│ D ┆ true ┆ -29931 ┆ 345 ┆ … ┆ 173.88 ┆ Jagan ┆ 1982-05-14 ┆ 2019-01-1 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 00:00:00 ┆ 5 |
└───────┴────────────┴────────────┴─────────┴───┴────────────┴────────────┴────────────┴───────────┘
I divide the entire dataframe into 2 category 1 Struct of Join on columns and second one is remaining other columns
src0_struct_df= src_0_lazy_df.select(pl.struct(join_on).alias("pks"),
pl.struct(other_columns).alias("Day0"))
src0_struct_df= src_1_lazy_df.select(pl.struct(join_on).alias("pks"),
pl.struct(other_columns).alias("Day1"))
I use an outer join and pint the plan which is fine. But the moment I run collect I hit the error.
etl_activity = src0_struct_df.join(src1_struct_df, on="pks", how="outer")
print(etl_activity) # Works fine
print(etl_activity.collect())
.
.
.
File "C:\\Users\\smruti\\AppData\\Roaming\\Python\\Python310\\site-packages\\polars\\utils\\deprecation.py", line 93, in wrapper
return function(\*args, \*\*kwargs)
File "C:\\Users\\smruti\\AppData\\Roaming\\Python\\Python310\\site-packages\\polars\\lazyframe\\frame.py", line 1561, in collect
return wrap_df(ldf.collect())
pyo3_runtime.PanicException: not implemented