3

I have a column of text that I need to find the substring and return the whole word, but can't figure out how to get the entire word.

Each column has text with a coding at the bottom labelled "ATT03", "ATT04" etc and I want to take that ATT and make a new column of each of the labels.

So for example my column looks like this:

blahblahblah text [ATT03]: blahblahblah

blahblahblah text [ATT03]: blahblahblah

blahblahblah text [ATT04]: blahblahbblahblah

blah text [ATT08]: blahblahblah

df_att=(df2.loc[:,'Report Text'].str.split("ATT",1)).str[-1]

I used this to create a new column, but it only splits the data into "ATT08: blahblahblahblah", and I really only want the ATT in between the "[]". I don't need all the extraneous data.

Is there regular expression/code that would return just the ATT03? without the rest of the string around it?

Thank you so much! I've been struggling through this for hours and am frustrated.

JLondon
  • 31
  • 2

1 Answers1

0

You can use the following regular expression:

df_att=(df2.loc[:,'Report Text'].str.extract("\[(ATT[^\]]*)")

It will extract the text between the brackets that you are looking for.

sophros
  • 14,672
  • 11
  • 46
  • 75