Any ideas on Iterating over dataframe and applying regex?

Question

This may be a rudimentary problem but I am new to pandas.

I have a csv dataframe and I want to iterate over each row to extract all the string information in a specific column through regex. . (The reason why I am using regex is because eventually I want to make a separate dataframe of that column)

I tried iterating through for loop but I got ton of errors. So far, It looks like for loop reads each input row as a list or series rather than a string (correct me if i'm wrong). My main functions are iteritems() and findall() but no good results so far. How can I approach this problem?

My dataframe looks like this:

df =pd.read_csv('foobar.csv')
df[['column1','column2, 'TEXT']]

My approach looks like this:

for Individual_row in df['TEXT'].iteritems():
   parsed = re.findall('(.*?)\:\s*?\[(.*?)\], Individual_row)
   res = {g[0].strip() : g[1].strip() for g in parsed}

Many thanks in advance

oreopot · Accepted Answer · 2021-04-06T01:44:27.980

0

you can try the following instead of loop:

df['new_TEXT'] = df['TEXT'].apply(lambda x: [g[0].strip(), g[1].strip()] for g in re.findall('(.*?)\:\s*?\[(.*?)\]', x), na_action='ignore' )

This will create a new column with your resultant data.

edited Apr 06 '21 at 01:44

answered Apr 06 '21 at 01:27

oreopot

3,392
2
19
28

I ran through your code and got an error saying 'Series' object has no attribute 'applymap' But thank you for the insight!! I'll look into applymap() – Won Chul Chung Apr 06 '21 at 01:40
I have updated the answer, can you try the updated solution? – oreopot Apr 06 '21 at 01:44
I tired the updated answer and got "expected string or bytes-like object." By the way, the dtype of df['TEXT'] appear as 'object'. I tired to change dtype into string using .astype(str) but it doesn't do anything. – Won Chul Chung Apr 06 '21 at 01:56

Any ideas on Iterating over dataframe and applying regex?

1 Answers1