Removing all spaces, punctuation, and capitalization

Question

We are working on the Vigenere cipher in my computer science class and one of the first steps our teacher wants us to take is to delete all whitespace, punctuation, and capitalization from a string.

#pre-process - removing spaces, punctuation, and capitalization
def pre_process(s):
    str = s.lower()
    s = (str.replace(" ", "") + str.replace("!?'.", ""))
    return s
print(pre_process("We're having a surprise birthday party for Eve!"))

What I want the output to be is "werehavingasurpisebirthdaypartyforeve" but what I'm actually getting is "we'rehavingasurprisebirthdaypartyforeve!we're having a surprise birthday party for eve!"

`str.replace("!?'.", "")` looks to replace the exact pattern `!?'.`. See [this post](https://stackoverflow.com/questions/3411771/multiple-character-replace-with-python) for how to replace multiple characters. — pault, Apr 03 '19 at 13:59

score 1 · Answer 1 · answered Apr 03 '19 at 14:03

You should use regex instead of string replace. Try this code.

import re
mystr="We're having a surprise birthday party for Eve!"
#here you can pass as many punctuations you want
result=re.sub("[.'!#$%&\'()*+,-./:;<=>?@[\\]^ `{|}~]","",mystr)
print(result.lower())

han solo · Answer 2 · 2019-04-03T14:18:42.447

You could use re ?,

>>> import re
>>> x
"We're having a surprise birthday party for Eve!"
>>> re.sub(r'[^a-zA-Z0-9]', '', x).lower() # negate the search. Fastest among the two :)
'werehavingasurprisebirthdaypartyforeve'

or list comprehension ?

>>> import string
>>> ''.join(y for y in x if y in string.ascii_letters).lower()
'werehavingasurprisebirthdaypartyforeve'

Just a benchmark,

>>> timeit.timeit("''.join(y for y in x if y in string.ascii_letters).lower()", setup='import string;x = "We\'re having a surprise birthday party for Eve!"')
7.747261047363281
>>> timeit.timeit("re.sub(r'[^a-zA-Z0-9]', '', x).lower()", setup='import re;x = "We\'re having a surprise birthday party for Eve!"')
2.912994146347046

Don't forget to make everything lowercase! – freethebees Apr 03 '19 at 14:03 — freethebees, Apr 03 '19 at 14:03

score 0 · Answer 3 · answered Apr 03 '19 at 14:00

0

str.replace("!?'.", "")) replaces only the string !?'., not any of the four characters on their own.

You need to use a separate replace call for each character, or otherwise use regular expressions.

answered Apr 03 '19 at 14:00

Christoph Burschka

4,467
3
16
31

score 0 · Answer 4 · answered Apr 03 '19 at 14:00

The reason your solution does not work, is because it is attempting to remove the literal string "!?'.", and not each character individually.

One way to accomplish this would be the following:

import re

regex = re.compile('[^a-zA-Z]')

s = "We're having a surprise birthday party for Eve!"
s = regex.sub('', s).lower()

score 0 · Answer 5 · answered Apr 03 '19 at 14:02

0

import re

def preprocess(s):
    return re.sub(r'[\W_]', '', s).lower()

re.sub removes all non-alphanumeric characters (everything except A-Z and 0-9).

lower() removes capitalization.

answered Apr 03 '19 at 14:02

vurmux

9,420
3
25
45

hiro protagonist · Answer 6 · 2019-04-03T14:09:45.620

str.translate is also an option. you can create a translation table using str.maketrans where the first arguments (ascii_uppercase) will be translated to the second ones (ascii_lowercase). the third argument (punctuation + whitespace) is a list of the characters you want deleted:

from string import ascii_lowercase, ascii_uppercase, punctuation, whitespace

table = str.maketrans(ascii_uppercase, ascii_lowercase, punctuation + whitespace)
s = "We're having a surprise birthday party for Eve!"
print(s.translate(table))
# werehavingasurprisebirthdaypartyforeve

once you have the table initialized every subsequent string can just be converted by applying

s.translate(table)

score 0 · Answer 7 · answered Apr 03 '19 at 14:04

0

An approach without using RegEx.

>>> import string
>>> s
"We're having a surprise birthday party for Eve!"
>>> s.lower().translate(None, string.punctuation).replace(" ", "")
'werehavingasurprisebirthdaypartyforeve'

answered Apr 03 '19 at 14:04

freethebees

957
10
24

score 0 · Answer 8 · answered Apr 03 '19 at 14:05

0

Change your code as below:-

def pre_process(s):
  str = s.lower()
  s = (str.replace(" ", ""))
  s= s.replace("!", "")
  s= s.replace("'", "")
  return s
print(pre_process("We're having a surprise birthday party for Eve!"))

answered Apr 03 '19 at 14:05

MeSterious

95
9

Removing all spaces, punctuation, and capitalization

8 Answers8