1

I am new to python polars and trying to convert the following pandas code to polars.

df.apply(lambda x: x[“obj”].compute(data), axis=1, expand = True)

Column obj in the dataframe df is composed of objects having a function property named compute. data is an external variable in the above code.

When I try the above code using polars,

dl.apply(lambda x: (x[0].compute(data)))

dl is the polars dataframe where the objects are stored in the first column, i.e 0.

I received the following error message:

‘Expr’ object doesn’t have compute property.

I am also not sure if polars have the expand feature.

Can you please help me how I can convert the above pandas apply to polars apply?

Thank you.

tempx
  • 408
  • 2
  • 4
  • 15
  • In general, you want to avoid embedding Python objects into a Polars DataFrame. The performance is not good, and columns of type object have limited functionality. Perhaps you explain what you are are trying to achieve at a higher level. –  Aug 30 '22 at 19:54
  • As an example, in this SO question, we were able to find a solution that does not involve embedding objects into a Polars DataFrame: https://stackoverflow.com/questions/73398176/how-to-apply-frozenset-on-polars-dataframe –  Aug 30 '22 at 20:16
  • @cbilot I want to calculate the means of the results obtained using the list of objects. assume that data_in is an array. I want to accomplish the following import numpy as np data_out = [x.compute(data_in) for x in objects] np.mean(data_out, axis=0) that is why I converted the objects to pandas and then calculated the means. – tempx Aug 30 '22 at 23:02
  • To achieve the massive parallel performance and optimizations that Polars offers, you need to express your objectives "the Polars way": that is, using the Polars Expressions API. The Polars User Guide may help explain: https://pola-rs.github.io/polars-book/user-guide/. (By contrast, embedding objects into Polars DataFrames and using Python byte code is inconsistent with this, and will not yield much benefit for you.) –  Aug 31 '22 at 20:18

1 Answers1

1

Have you tried :

df.with_columns([pl.col('obj').apply(lambda d:d.compute(data))])

I don´t think its optimal but I think it can work.

  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jan 25 '23 at 12:13