2

I'm trying to split a string by capital letter BUT i don't want to split two consecutive capital letters.

So for now I'm doing this:

my_string == "TTestStringAA"
re.findall('[a-zA-Z][^A-Z]*', my_string)
>>> ['T', 'Test', 'String', 'A', 'A']

But the output that I'm looking for is:

>>> ['TTest', 'String', 'AA']

There is a clean and simple solution to this problem?

Thx!

3 Answers3

5

I believe [A-Z]+[a-z]* meets your requirements:

>>> re.findall(r'[A-Z]+[a-z]*', my_string)
['TTest', 'String', 'AA']
thegamecracks
  • 585
  • 2
  • 8
  • This worked for me but I did some little changes because I found that this solution didn't preserve the lower letters at the start and numbers were not correctly split so finally this was the solution I came up with: r'[A-Z0-9]*[a-z]*' – Roure Ossó Sep 25 '20 at 22:58
0

The following regex will return the correct result.

[a-z]*[A-Z]+[a-z]*|[a-z]+$

Test Cases:

tests = ['a', 'A', 'aa', 'Aa' 'AaAaAAAaAa', 'aTTestStringAA']
regex = re.compile(r'[a-z]*[A-Z]+[a-z]*|[a-z]+$')
for test in tests:
    print('{} => {}'.format(test, re.findall(regex, test)))
Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37
Nitul
  • 369
  • 3
  • 7
0

Use re.split with

(?<=[a-z])(?=[A-Z])

See proof.

Explanation

--------------------------------------------------------------------------------
  (?<=                     look behind to see if there is:
--------------------------------------------------------------------------------
    [a-z]                    any character of: 'a' to 'z'
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    [A-Z]                    any character of: 'A' to 'Z'
--------------------------------------------------------------------------------
  )                        end of look-ahead

Python code:

import re
pattern = r"(?<=[a-z])(?=[A-Z])"
test = "TTestStringAA"
print(re.split(pattern, test))

Results:

['TTest', 'String', 'AA']
Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37