1

I have this code in pandas:

df[col] = (
            df[col]
            .fillna(method="ffill", limit=1)
            .apply(lambda x: my_function(x))
        )

I want to re-write this in Polars.

I have tried this:

df = df.with_columns(
            pl.col(col)
            .fill_null(strategy="forward", limit=1)
            .apply(lambda x: my_function(x))
        )

It does not work properly. It fills with forward strategy but ignores filling missing values with my defined function. What should I change in my code to get what I want?

try this code:

df_polars = pl.DataFrame(
    {"A": [1, 2, None, None, None, None, 4, None], "B": [5, None, None, None, None, 7, None, 9]}
)

df_pandas = pd.DataFrame(
    {"A": [1, 2, None, None, None, None, 4, None], "B": [5, None, None, None, None, 7, None, 9]}
)

last_valid_data: int


def my_function(x):
    global last_valid_data
    if x == None or np.isnan(x):
        result = last_valid_data * 10
    else:
        last_valid_data = x
        result = x
    return result


col = "A"

last_valid_data = df_pandas[col][0]
df_pandas[col] = df_pandas[col].fillna(method="ffill", limit=1).apply(lambda x: my_function(x))

last_valid_data = df_polars[col][0]
df_polars = df_polars.with_columns(
    pl.col(col).fill_null(strategy="forward", limit=1).apply(lambda x: my_function(x))
)

Desired output in pandas is:

      A    B
0   1.0  5.0
1   2.0  NaN
2   2.0  NaN
3  20.0  NaN
4  20.0  NaN
5  20.0  7.0
6   4.0  NaN
7   4.0  9.0

What I get in Polars is:

┌──────┬──────┐
│ A    ┆ B    │
│ ---  ┆ ---  │
│ i64  ┆ i64  │
╞══════╪══════╡
│ 1    ┆ 5    │
│ 2    ┆ null │
│ 2    ┆ null │
│ null ┆ null │
│ null ┆ null │
│ null ┆ 7    │
│ 4    ┆ null │
│ 4    ┆ 9    │
└──────┴──────┘
Honio
  • 21
  • 6
  • You'll have to give a runnable example to show it not working. – jqurious Jul 12 '23 at 07:08
  • 1
    Much better example - thank you. I think the problem is you need to pass `skip_nulls=False` to `.apply` as it defaults to `True` in Polars. (It does seem like this could be done natively in Polars using expressions.) – jqurious Jul 12 '23 at 07:39
  • It fixed my issue! Thank you ^^ – Honio Jul 12 '23 at 07:47
  • It looks equivalent to: `pl.col('A').forward_fill(limit=1).fill_null((pl.col('A') * 10).forward_fill())` – jqurious Jul 12 '23 at 07:48
  • 1
    My defined function is more complex than what I provided here. However, thank you for your further explanation. – Honio Jul 12 '23 at 07:55

1 Answers1

2

The issue here is that in Polars .apply defaults to skip_nulls=True

df_polars.with_columns(
   pl.col('A').apply(lambda me: print(f'{me=}'))
)
me=1
me=2
me=4

As your example specifically needs to target the nulls, you need to change this to False

df_polars.with_columns(
   pl.col('A').apply(lambda me: print(f'{me=}'), skip_nulls=False)
)
me=1
me=2
me=None
me=None
me=None
me=None
me=4
me=None
jqurious
  • 9,953
  • 1
  • 4
  • 14