1

Almost there, but I've found a couple of holes with my regex to turn CamelCase into Spaced Sentence Case. It does well on most cases (pun intended), but it's getting hung up on the first hyphenated word. I can't figure out why.

import re

# ---------------------------------------------------------
def camelCaseToSpacedTitleCase(u):
  # add spaces
  regex = re.sub("(.)([A-Z][a-z-]+)", r"\1 \2", u)

  # make title case
  regex = re.sub("([a-z0-9])([A-Z])", r"\1 \2", regex)

  # keep CAPITALISED words
  regex = re.sub("(^|\s)(\S)", r"\1" + r"\2".upper(), regex)

  # keep Mc and Mac
  regex = re.sub("(Mc|Mac)(\s)", r"\1", regex)

  # keep digits
  regex = re.sub("([a-z]+)([0-9]+\s)", r"\1 \2", regex)

  # keep I or A
  regex = re.sub("(\sA|\sI)([A-Z])([a-z]*)", r"\1 \2\3", regex)

  # remove double whitespaces
  regex = re.sub("\s{2,32}", r" ", regex)

  return regex


test1 = "TheAmazingSpider-Man"
test2 = "WeAreSexBob-Omb"
test3 = "SR-128  SomethingSomething"
test4 = "Ex-Voto - Monitor"
test5 = "FergusMcNeilEyeContact"
test6 = "It'sABanana"
test7 = "HouseOf1000Zombies!"

print (camelCaseToSpacedTitleCase(test1))
print (camelCaseToSpacedTitleCase(test2))
print (camelCaseToSpacedTitleCase(test3))
print (camelCaseToSpacedTitleCase(test4))
print (camelCaseToSpacedTitleCase(test5))
print (camelCaseToSpacedTitleCase(test6))
print (camelCaseToSpacedTitleCase(test7))

I would expect to see

"The Amazing Spider-Man" 
"We Are Sex Bob-Omb"
"SR-128 Something Something"
"Ex-Voto - Monitor"
"Fergus McNeil Eye Contact"
"It's A Banana"
"House Of 1000 Zombies!"

I want to avoid using .titlecase() for the reasons above

Ghoul Fool
  • 6,249
  • 10
  • 67
  • 125
  • what do you see instead? – R Nar Oct 15 '15 at 21:43
  • This is a disgustingly inefficient way to do it all in a single regex. Not sure it matches all criteria (but appears to match all the cases you shared) and I'd strongly recommend against using this, lol: https://regex101.com/r/mB1lN8/1 – lintmouse Oct 15 '15 at 22:03

1 Answers1

2

This type of processing can be tricky. I think the problem you're seeing is in the first step when you first add spaces. Instead of adding a space to any camel case split, apply only if not a hyphen preceding the split.

# Replace: regex = re.sub("(.)([A-Z][a-z-]+)", r"\1 \2", u)
regex = re.sub("([^-])([A-Z][a-z-]+)", r"\1 \2", u)

Gives the following results...

# The Amazing Spider-Man
# We Are Sex Bob-Omb
# SR-128 Something Something
# Ex-Voto - Monitor
# Fergus McNeil Eye Contact
# It's A Banana
# House Of 1000 Zombies!
leroyJr
  • 1,110
  • 9
  • 17
  • Thanks for your help @leroyJr, the problem **is** a lot tricker than it looks. You may have to double check but I've a feeling that the third line comes out as S R-128 Something Something – Ghoul Fool Oct 18 '15 at 10:33
  • @GhoulFool, I reviewed the results again and they look fine. The requirement for that first RE is at a minimum 3 characters in which case the last has to be a lower case alpha. SR- doesn't fit that case. Having said that If you don't want to split two uppercase alphas next to each other the RE could change to `regex = re.sub("([^-A-Z])([A-Z][a-z-]+)", r"\1 \2", u) `. – leroyJr Oct 19 '15 at 16:30