-1

We are working on the Vigenere cipher in my computer science class and one of the first steps our teacher wants us to take is to delete all whitespace, punctuation, and capitalization from a string.

#pre-process - removing spaces, punctuation, and capitalization
def pre_process(s):
    str = s.lower()
    s = (str.replace(" ", "") + str.replace("!?'.", ""))
    return s
print(pre_process("We're having a surprise birthday party for Eve!"))

What I want the output to be is "werehavingasurpisebirthdaypartyforeve" but what I'm actually getting is "we'rehavingasurprisebirthdaypartyforeve!we're having a surprise birthday party for eve!"

pault
  • 41,343
  • 15
  • 107
  • 149
  • `str.replace("!?'.", "")` looks to replace the exact pattern `!?'.`. See [this post](https://stackoverflow.com/questions/3411771/multiple-character-replace-with-python) for how to replace multiple characters. – pault Apr 03 '19 at 13:59

8 Answers8

1

You should use regex instead of string replace. Try this code.

import re
mystr="We're having a surprise birthday party for Eve!"
#here you can pass as many punctuations you want
result=re.sub("[.'!#$%&\'()*+,-./:;<=>?@[\\]^ `{|}~]","",mystr)
print(result.lower())
Rajat
  • 118
  • 1
  • 5
0

You could use re ?,

>>> import re
>>> x
"We're having a surprise birthday party for Eve!"
>>> re.sub(r'[^a-zA-Z0-9]', '', x).lower() # negate the search. Fastest among the two :)
'werehavingasurprisebirthdaypartyforeve'

or list comprehension ?

>>> import string
>>> ''.join(y for y in x if y in string.ascii_letters).lower()
'werehavingasurprisebirthdaypartyforeve'

Just a benchmark,

>>> timeit.timeit("''.join(y for y in x if y in string.ascii_letters).lower()", setup='import string;x = "We\'re having a surprise birthday party for Eve!"')
7.747261047363281
>>> timeit.timeit("re.sub(r'[^a-zA-Z0-9]', '', x).lower()", setup='import re;x = "We\'re having a surprise birthday party for Eve!"')
2.912994146347046
han solo
  • 6,390
  • 1
  • 15
  • 19
0

str.replace("!?'.", "")) replaces only the string !?'., not any of the four characters on their own.

You need to use a separate replace call for each character, or otherwise use regular expressions.

Christoph Burschka
  • 4,467
  • 3
  • 16
  • 31
0

The reason your solution does not work, is because it is attempting to remove the literal string "!?'.", and not each character individually.

One way to accomplish this would be the following:

import re

regex = re.compile('[^a-zA-Z]')

s = "We're having a surprise birthday party for Eve!"
s = regex.sub('', s).lower()
Greg
  • 1,845
  • 2
  • 16
  • 26
0
import re

def preprocess(s):
    return re.sub(r'[\W_]', '', s).lower()

re.sub removes all non-alphanumeric characters (everything except A-Z and 0-9).

lower() removes capitalization.

vurmux
  • 9,420
  • 3
  • 25
  • 45
0

str.translate is also an option. you can create a translation table using str.maketrans where the first arguments (ascii_uppercase) will be translated to the second ones (ascii_lowercase). the third argument (punctuation + whitespace) is a list of the characters you want deleted:

from string import ascii_lowercase, ascii_uppercase, punctuation, whitespace

table = str.maketrans(ascii_uppercase, ascii_lowercase, punctuation + whitespace)
s = "We're having a surprise birthday party for Eve!"
print(s.translate(table))
# werehavingasurprisebirthdaypartyforeve

once you have the table initialized every subsequent string can just be converted by applying

s.translate(table)
hiro protagonist
  • 44,693
  • 14
  • 86
  • 111
0

An approach without using RegEx.

>>> import string
>>> s
"We're having a surprise birthday party for Eve!"
>>> s.lower().translate(None, string.punctuation).replace(" ", "")
'werehavingasurprisebirthdaypartyforeve'
freethebees
  • 957
  • 10
  • 24
0

Change your code as below:-

def pre_process(s):
  str = s.lower()
  s = (str.replace(" ", ""))
  s= s.replace("!", "")
  s= s.replace("'", "")
  return s
print(pre_process("We're having a surprise birthday party for Eve!"))
MeSterious
  • 95
  • 9