Pandas - Applying operation to dataframe but skipping over NaN values

Question

So I have this Series data that can look like this

1         532
2         554
3         NaN
...       ... 
Name: score, Length: 941940, dtype: str

and I split it into 3 columns on each character using apply(lambda x: pd.Series(list(x), but it throws an error for the index 3 because it's NaN. How do I use apply so that it supports NaN and splits the value like below?

        score_0  score_1  score_2
1       5          3        2
2       5          5        4     
3       NaN        NaN      NaN          
...     ...        ...      ...
[941940 rows x 3 columns]

In the future, include a minimum working example of your code. For example, there is no context for your lambda function, and it doesn't appear to do anything — astroChance, Jun 14 '23 at 20:11
What is mysterious, @user19077881, is the fact that the OP's series has `dtype: str`. — PaulS, Jun 14 '23 at 20:25

PaulS · Answer 1 · 2023-06-14T20:27:57.880

2

Another possible solution (similar to @GodIsOne's), which uses regex to avoid splitting at the beginning and the end of each number:

s.str.split(r'(?<=\d)(?=\d)', expand=True)

Output:

     0    1    2
0    5    3    2
1    5    5    4
2  NaN  NaN  NaN

edited Jun 14 '23 at 20:27

answered Jun 14 '23 at 20:21

PaulS

21,159
2
9
26

score 1 · Answer 2 · answered Jun 14 '23 at 20:03

You can use .str.split("", expand=True) to split each character into separate columns. And this passes over NaN values:

# split each character, delete first and last columns of empties.
df = ser.str.split("", expand=True).iloc[:, 1:-1]
# rename columns
df.columns = [f"score_{i}" for i in range(len(df.columns))]

An explanation for why there are additional columns created can be found in answers here.

Talha Tayyab · Answer 3 · 2023-06-15T05:06:09.547

I will take a minimum example:

import numpy as np
s=pd.Series(['532','554',np.nan])

print(s)

0    532
1    554
2    NaN
dtype: object

k=s.str.split("",expand=True).fillna(np.nan)

To omit first and last column:

k=k.iloc[:, 1:-1]
print(k)
    1   2   3
0   5   3   2
1   5   5   4
2   NaN NaN NaN

k.columns = ["score_{}".format(i) for i in range(len(k.columns))]

print(k)

    score_0 score_1 score_2
0   5   3   2
1   5   5   4
2   NaN NaN NaN

Pandas - Applying operation to dataframe but skipping over NaN values

3 Answers3