-1

I have a string that is

mystring = 'Aston VillaLiverpoolMan City'

My goal is to get the output below

"Aston Villa", "Liverpool", "Man City". 

ie to spilt on the capital letter when it isnt preceded by a space

I am getting close with re.findall but it is not providing the output that I want

import re 
    
myString = 'Aston VillaLiverpoolMan City'
result = re.findall("(?<!\s)[A-Z][a-z]*",myString)
print(result)

This produces

"Aston", "Liverpool", "Man" 

(missing the Villa at the end of Aston and the City at the end of Man)

Thanks!

  • 1
    Hi Chris is it doesn't help. That question is for splitting on the capital letter only not also when there isn't a space before it (unless i have missed something) – Username2743807095872306754 Sep 26 '20 at 17:16
  • As OP mentioned in his comment. Its not actually an exact duplicate. OP is looking for a solution to skip Capital letter followed by Space. – vanisk Sep 26 '20 at 18:00
  • Try this and stitching the answer together is easy: result = re.split("(\B[A-Z])",myString) – Mallik Kumar Sep 27 '20 at 16:35

2 Answers2

1

Actually the link posted by @Chris does answer your query, except it needed a little alteration.

s = 'Aston VillaLiverpoolMan City'
pos = [i for i,e in enumerate(s+'A') if e.isupper() and s[i-1]!=" "]
parts = [s[pos[j]:pos[j+1]] for j in range(len(pos)-1)]
print (parts)

results in

['Aston Villa', 'Liverpool', 'Man City']

Credit goes to @pwdyson.

vanisk
  • 130
  • 1
  • 11
0

The answer by @vanisk is perfect.

You can also try

mystring = 'Aston VillaLiverpoolMan City'
s = list(mystring)
for i in range(1,len(s)):
    if ord(s[i]) in range(65, 91):
        if s[i-1]!=' ':
            s[i]='\n'+s[i]

splitstring = ''.join(s).split('\n')
print(splitstring)

This too gives

['Aston Villa', 'Liverpool', 'Man City']
Subimal
  • 76
  • 3