How do I return a specific substring within a Pandas dataframe

Question

I have a column of text that I need to find the substring and return the whole word, but can't figure out how to get the entire word.

Each column has text with a coding at the bottom labelled "ATT03", "ATT04" etc and I want to take that ATT and make a new column of each of the labels.

So for example my column looks like this:

blahblahblah text [ATT03]: blahblahblah

blahblahblah text [ATT04]: blahblahbblahblah

blah text [ATT08]: blahblahblah

df_att=(df2.loc[:,'Report Text'].str.split("ATT",1)).str[-1]

I used this to create a new column, but it only splits the data into "ATT08: blahblahblahblah", and I really only want the ATT in between the "[]". I don't need all the extraneous data.

Is there regular expression/code that would return just the ATT03? without the rest of the string around it?

Thank you so much! I've been struggling through this for hours and am frustrated.

Thank you so much! This is very close. It gives me the characters before the ATT though. " ATTENDING PHYSICIAN AGREEMENT [ATT03" is the input I got out. — JLondon, Oct 09 '22 at 23:27

score 0 · Answer 1 · answered Dec 19 '22 at 11:28

0

You can use the following regular expression:

df_att=(df2.loc[:,'Report Text'].str.extract("\[(ATT[^\]]*)")

It will extract the text between the brackets that you are looking for.

answered Dec 19 '22 at 11:28

sophros

14,672
11
46
75

@JLondon: please mark the question as answered / answer as accepted. – sophros Dec 19 '22 at 11:29

How do I return a specific substring within a Pandas dataframe

1 Answers1