1

So I have this Series data that can look like this

1         532
2         554
3         NaN
...       ... 
Name: score, Length: 941940, dtype: str

and I split it into 3 columns on each character using apply(lambda x: pd.Series(list(x), but it throws an error for the index 3 because it's NaN. How do I use apply so that it supports NaN and splits the value like below?

        score_0  score_1  score_2
1       5          3        2
2       5          5        4     
3       NaN        NaN      NaN          
...     ...        ...      ...
[941940 rows x 3 columns]
Gooby
  • 621
  • 2
  • 11
  • 32
  • In the future, include a minimum working example of your code. For example, there is no context for your lambda function, and it doesn't appear to do anything – astroChance Jun 14 '23 at 20:11
  • What is mysterious, @user19077881, is the fact that the OP's series has `dtype: str`. – PaulS Jun 14 '23 at 20:25

3 Answers3

2

Another possible solution (similar to @GodIsOne's), which uses regex to avoid splitting at the beginning and the end of each number:

s.str.split(r'(?<=\d)(?=\d)', expand=True)

Output:

     0    1    2
0    5    3    2
1    5    5    4
2  NaN  NaN  NaN
PaulS
  • 21,159
  • 2
  • 9
  • 26
1

You can use .str.split("", expand=True) to split each character into separate columns. And this passes over NaN values:

# split each character, delete first and last columns of empties.
df = ser.str.split("", expand=True).iloc[:, 1:-1]
# rename columns
df.columns = [f"score_{i}" for i in range(len(df.columns))]

An explanation for why there are additional columns created can be found in answers here.

Rawson
  • 2,637
  • 1
  • 5
  • 14
0

I will take a minimum example:

import numpy as np
s=pd.Series(['532','554',np.nan])

print(s)

0    532
1    554
2    NaN
dtype: object

k=s.str.split("",expand=True).fillna(np.nan)

To omit first and last column:

k=k.iloc[:, 1:-1]
print(k)
    1   2   3
0   5   3   2
1   5   5   4
2   NaN NaN NaN

k.columns = ["score_{}".format(i) for i in range(len(k.columns))]

print(k)

    score_0 score_1 score_2
0   5   3   2
1   5   5   4
2   NaN NaN NaN
Talha Tayyab
  • 8,111
  • 25
  • 27
  • 44