1

I need to remove words from a string that begin with a number or special character. I cannot simply put specific values since the string is based on user input, so it will be unknown. All I've been able to come up with, without having to import anything, is using

.startswith(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, '!', '"', '#', '$', '%', '&', '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '=', '>', '?', '@', '[', '\', ']', '^', '_', '`', '{', '|', '}', '~', ')', ':')

There must be an easier, less lengthy way, right?

martineau
  • 119,623
  • 25
  • 170
  • 301

2 Answers2

2

I need to remove words from a string that begin with a number or special character. [...]

I'd suggest taking a look at the string module. This is a builtin module which defines common characters such as punctuation, digits, alphanumeric characters, etc.

From there, it should be straightforward enough to transfer the desired variables from the string module as variables that you define in code:

digits = '0123456789'
punctuation = r"""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""

invalid_start_chars = digits + punctuation

Then test with a sample input:

string = "Hello 123World H0w's @It [Going? Testing123"

print(' '.join(
    [word for word in string.split()
     if word[0] not in invalid_start_chars]
))

Output:

Hello H0w's Testing123
rv.kvetch
  • 9,940
  • 3
  • 24
  • 53
  • 1
    The question says *without having to import anything*. Your suggestion involves doing `import string`, no? – BoarGules Nov 21 '21 at 00:17
  • 1
    I can clarify, but my suggestion was to copy those variables defined in `string` into the own module namespace – rv.kvetch Nov 21 '21 at 00:18
1

I'd recommend to use standart module string.

from string import punctuation


def check_word(word: str) -> bool:
    return not word.startswith(tuple(punctuation+'0123456789'))


def fix_string(s: str) -> str:
    return " ".join(word for word in s.split() if check_word(word))

So you could use function fix_string like this:

s = "Hello !world! This is a good .test"
print('Result:', fix_string(s))
# Result: Hello This is a good