pandas column to be split on finding a percentage value number and/or an opening parenthesis that has any amount of string in it

Question

I have a df with one of the columns that appears like:

**Share**
We are safe 25%
We are always safe 12.50% (India Aus, West)
We are ok (USA, EU)
We are not OK
What is this
Always wise 25.66%

I want to split this column such that the % values wherever applicable get split from the column into a new one. So the output would be

Share                  Percent    LOCATION
We are safe            25%  
We are always safe     12.50%     India Aus, West
We are ok                         USA, EU
We are not OK
What is this
Always wise            25.66%

Looks like you are looking to create a regex, but do not know where to get started. Please check [Reference - What does this regex mean](https://stackoverflow.com/questions/22937618) resource, it has plenty of hints. Also, refer to [Learning Regular Expressions](https://stackoverflow.com/questions/4736) post for some basic regex info. Once you get some expression ready and still have issues with the solution, please edit the question with the latest details and we'll be glad to help you fix the problem. — Wiktor Stribiżew, Oct 14 '20 at 15:06

score 0 · Answer 1 · answered Oct 14 '20 at 14:51

0

Just base on your sample data:

print (df["Share"].str.extract('([A-Za-z\s]+)\s?(\d+[.0-9]+%)?\s?\(?(.*(?=\)))?'))

                     0       1                2
0         We are safe      25%              NaN
1  We are always safe   12.50%  India Aus, West
2           We are ok      NaN          USA, EU
3        We are not OK     NaN              NaN
4         What is this     NaN              NaN
5         Always wise   25.66%              NaN

Try it online here.

answered Oct 14 '20 at 14:51

Henry Yik

22,275
4
18
40

If the first part has & or a -, it is also getting split, Can you help how i can retain them under column 0 – asimo Oct 14 '20 at 15:14
'([A-Za-z&\-)\s]+)\s?(\d+[.0-9]+%)?\s?\(?(.*(?=\)))?' – asimo Oct 14 '20 at 15:30

pandas column to be split on finding a percentage value number and/or an opening parenthesis that has any amount of string in it

1 Answers1