0

I'm trying to exctract names enclosed in square brackets and which appear only after a substring. In the example sentence shown below, the substring is "[A]."

"This is [A].[Alpha] and this is [A].[Beta] and this is [A].[Charlie] and so on"

I'm trying to generate a list as shown below:

enter image description here

gtomer
  • 5,643
  • 1
  • 10
  • 21

2 Answers2

1

\[A\].\[([^\]]*)]

https://regex101.com/r/NF526r/1

That should do the trick for you. I'm taking advantage of negated character classes.

Here is a demo in python:

import re

mystring = "This is [A].[Alpha] and this is [A].[Beta] and this is [A].[Charlie] and so on"

values = re.findall("\[A\].\[([^\]]*)]", mystring)

print(values)

results:

['Alpha', 'Beta', 'Charlie']
sniperd
  • 5,124
  • 6
  • 28
  • 44
1

Try this:

df['col'] = df['col'].str.findall(r"\[A\].\[([^\]]*)]")
df.explode('col')

        col
0    Alpha
0     Beta
0  Charlie

Where 'col' is the column with your text.

gtomer
  • 5,643
  • 1
  • 10
  • 21