1

I have a python function which takes a polars dataframe, a column name and a default value. The function will return a polars series (with length the same as the number of rows of the dataframe) based on the column name and default value.

  • When the column name is None, just return a series of default values.
  • When the column name is not None, return that column from dataframe as a series.

And, I want to achieve this with just oneline polars expression.

Below is an example for better illustration.

The function I want has the following signature.

import polars as pl

def f(df, colname=None, value=0):
    pass

And below are the behaviors I want to have.

>>> df = pl.DataFrame({"a": [1, 2, 3], "b": [2, 3, 4]})

>>> f(df)
shape: (3,)
Series: '' [i64]
[
        0
        0
        0
]

>>> f(df, "a")
shape: (3,)
Series: '' [i64]
[
        1
        2
        3
]

This is what I tried, basically use polars.when.

def f(df, colname=None, value=0):
    return df.select(pl.when(colname is None).then(pl.lit(value)).otherwise(pl.col(colname))).to_series()

But the code errors out when colname is None, with the error message: TypeError: argument 'name': 'NoneType' object cannot be converted to 'PyString'.

Another problem is that the code below runs successfully, but it returns a dataframe with shape (1, 1),

>>> colname = None
>>> value = 0
>>> df.select(pl.when(colname is None).then(pl.lit(value)).otherwise(100))
shape: (1, 1)
┌─────────┐
│ literal │
│ ---     │
│ i32     │
╞═════════╡
│ 0       │
└─────────┘

the result I want is a dataframe with shape (3, 1), e.g.,

shape: (3, 1)
┌─────────┐
│ literal │
│ ---     │
│ i32     │
╞═════════╡
│ 0       │
│ 0       │
│ 0       │
└─────────┘

What am I supposed to do?

lebesgue
  • 837
  • 4
  • 13

1 Answers1

3

Is there a reason you can't implement the if/else logic in Python?

def f(df, colname=None, value=0):
    if colname is None:
       series = pl.Series().extend_constant(value, df.height)
    else:
       series = df.get_column(colname)
    return series
jqurious
  • 9,953
  • 1
  • 4
  • 14
  • I want to plug such a potential oneline expression into a select clause together with other polars expressions. And, I think using pure polars expressions will be faster? – lebesgue Feb 09 '23 at 16:32
  • I see. I suppose you could do something like: `df.select(pl.lit(value).repeat_by(df.height).flatten() if colname is None else colname)` – jqurious Feb 09 '23 at 16:41
  • Thanks. Is the if here fine in terms of speed? I read from the doc that as long as we have python code, the process will be slowed down quite a lot. Also, does it mean there is no way to use polars.when? – lebesgue Feb 09 '23 at 16:45
  • The mentions of "python code"/slowdowns are referring to the use of `.apply()` instead of using expressions - not in a case like this. – jqurious Feb 09 '23 at 17:03
  • Got it. What if the function in apply is also all of polars expressions code? Will we still have the slowness? – lebesgue Feb 09 '23 at 17:20
  • I'm not sure how `.apply()` would contain polars expressions? You would [use expressions instead of `.apply()`](https://stackoverflow.com/a/75351869) – jqurious Feb 09 '23 at 17:30
  • I have an use case where if certain conditions are met, then no need to use other third-party functions, but if not, then delegate to other packages. So, that's why I am considering this, about how can I make the most use of polars. – lebesgue Feb 09 '23 at 17:35
  • Hm, okay - perhaps if you post a full example of what you're actually doing people can give you a better answer. (Maybe it should be a new question?) – jqurious Feb 09 '23 at 17:40