0

I've started using polars recently (https://pola-rs.github.io/polars/py-polars/html/reference/index.html)

I have a column in my data frame that contains single element arrays (output of a keras model.predict):

X
object
[0.49981183]
[0.49974033]
[0.4997973]
[0.49973667]
[0.49978396]

I want to convert this into a column of floats:

0.49981183
0.49974033
0.4997973
0.49973667
0.49978396

I've tried:

data = data.with_column((pl.col("X")[0]).alias("Y"))

but it gives me this error:

TypeError: 'Expr' object is not subscriptable

What's the right way to do this? There are around 67 million rows so the faster the better

Cheers

user555265
  • 493
  • 2
  • 7
  • 18

1 Answers1

1

Unfortunately, columns of type Object are often a dead-end. From the Data Types section of the Polars User Guide:

Object: A limited supported data type that can be any value.

Since support is limited, operations on columns of type Object often throw exceptions.

However, there may be a way to retrieve the values in this particular situation. As an example, let's purposely create a column of type object.

import polars as pl
data_as_list = [[0.49981183], [0.49974033],
                [0.4997973], [0.49973667], [0.49978396]]

df = pl.DataFrame([
        pl.Series("X", values=data_as_list, dtype=pl.Object),
])
print(df)
shape: (5, 1)
┌──────────────┐
│ X            │
│ ---          │
│ object       │
╞══════════════╡
│ [0.49981183] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ [0.49974033] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ [0.4997973]  │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ [0.49973667] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ [0.49978396] │
└──────────────┘

This approach may work...

def attempt_recover(series: pl.Series) -> pl.Series:
    return pl.Series(values=[val[0] for val in series])

df.with_column(pl.col("X").map(attempt_recover).alias("X_recovered"))
shape: (5, 2)
┌──────────────┬─────────────┐
│ X            ┆ X_recovered │
│ ---          ┆ ---         │
│ object       ┆ f64         │
╞══════════════╪═════════════╡
│ [0.49981183] ┆ 0.499812    │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ [0.49974033] ┆ 0.4997      │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ [0.4997973]  ┆ 0.4997973   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ [0.49973667] ┆ 0.499737    │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ [0.49978396] ┆ 0.499784    │
└──────────────┴─────────────┘

Try this first on a tiny subset of your data. This may not work. (And it will not be fast.)

What you'll want to do is alter the way that model prediction results from Keras are loaded into Polars to prevent getting a column of type Object. (Often this means indexing an array/list output to extract the number from the array/list before loading into Polars.)

  • One further thought: are you using the latest version of Polars? If not, you may want to update Polars. There have been numerous improvements in type conversions when data is loaded into Polars from other sources. –  May 05 '22 at 01:40
  • Hi @cbilot - thanks! that worked .. it took 4 minutes which is slow but i can live with it .. i am on the latest polars .. hopefully a suitable type conversion will come soon to cover this case as well – user555265 May 05 '22 at 07:18