Convert pandas series of strings to a series of lists

Question

For iinstance I have a dataframe as below

import pandas as pd
df = pd.DataFrame({"col":['AM RLC, F C', 'AM/F C', 'DM','D C']})

    |col
-------------------|
0   |"AM RLC, F C" |
1   |"AM/F C"      |
2   |"DM"          |
3   |"D C"         |

My expected output is as following

    |col
----|-----------------------|
 0  |["AM", "RLC", "F", "C"]|
 1  |["AM", "F", "C"]       |
 2  |["DM" ]                |
 3  |["D", "C"]             |

",", "/" and "space" should be treated as delimiter,

The answers in this question do not answer my queries

mozway · Accepted Answer · 2023-01-05T15:58:42.343

4

I would use str.split or str.findall:

df['col'] = df['col'].str.split('[\s,/]+')

# or
df['col'] = df['col'].str.findall('\w+')

Output:

               col
0  [AM, RLC, F, C]
1       [AM, F, C]
2             [DM]
3           [D, C]

Regex:

[\s,/]+  # at least one of space/comma/slash with optional repeats

\w+      # one or more word characters

edited Jan 05 '23 at 15:58

answered Jan 05 '23 at 15:53

mozway

194,879
13
39
75

score 3 · Answer 2 · answered Jan 05 '23 at 15:52

3

try this:

df["col"].apply(lambda x:x.replace(",","").replace("/"," ").split(" "))

answered Jan 05 '23 at 15:52

Mouad Slimane

913
3
12

ali bakhtiari · Answer 3 · 2023-01-05T16:04:44.653

2

An one-liner that finds any punctuation in your string and replaces it with empty space. Then you can split the string and get a clean list:

import string

df['col'].str.replace(f'[{string.punctuation}]', ' ', regex=True).str.split().to_frame()

edited Jan 05 '23 at 16:04

answered Jan 05 '23 at 15:59

ali bakhtiari

1,051
4
23

This won't work as you showed it, there are many mistakes, f-string is incorrect, you need to craft and use a regex, and replace with a space. For instance: `df['col'].str.replace(f'[{re.escape(string.punctuation)}]+', ' ', regex=True).str.split()` – mozway Jan 05 '23 at 16:02
1

You are right, thanks for your vigilance. I edited my original answer and it works now. – ali bakhtiari Jan 05 '23 at 16:07

score 2 · Answer 4 · answered Jan 05 '23 at 16:09

Apply a function on rows of col column to filter its content. In this case the function is written in lambda form.

import pandas as pd
import re

df = pd.DataFrame({"col":['AM RLC, F C', 'AM/F C', 'DM','D C']})

df['col'] = df['col'].apply(lambda x: str(re.findall(r"[\w']+", x)))

print(df.head())

output:

                       col
0  ['AM', 'RLC', 'F', 'C']
1         ['AM', 'F', 'C']
2                   ['DM']
3               ['D', 'C']

Convert pandas series of strings to a series of lists

4 Answers4