0

How can I replace any character outside of the English alphabet?

For example, 'abcdükl*m' replaced with a ' ' would be 'abcd kl m'

Michael Petrotta
  • 59,888
  • 27
  • 145
  • 179
user1772859
  • 51
  • 1
  • 6
  • What did you try? Which resource did you consult? Do you know about "negated character classes"? –  Oct 25 '12 at 01:16

4 Answers4

7

Use the regex [^a-zA-Z]:

re.sub(r'[^a-zA-Z]', '', mystring)

Some info: the a-zA-Z are character ranges that indicate all the lowercase and uppercase letter, respectively, and the caret ^ at the beginning of the character class indicates negation, e.g. "anything except these".

voithos
  • 68,482
  • 12
  • 101
  • 116
3

Assuming you're trying to normalize text, see my link under "Comprehensive character replacement module in python for non-unicode and non-ascii for HTML".

unicodedata has a normalize method that can gracefully degrade text for you:

import unicodedata
def gracefully_degrade_to_ascii( text ):
    return unicodedata.normalize('NFKD',text).encode('ascii','ignore')

Full Docs - http://docs.python.org/library/unicodedata.html

If you're trying to just strip out non-ASCII chars, the negated character set regex that others mentioned is the way to do it.

Community
  • 1
  • 1
Jonathan Vanasco
  • 15,111
  • 10
  • 48
  • 72
1

Search for [^a-zA-Z] and replace with ' '

pogo
  • 1,479
  • 3
  • 18
  • 23
1
>>> import string
>>> print ''.join(x if x in string.ascii_letters else ' ' for x in u'abcdükl*m') 
abcd kl m
John La Rooy
  • 295,403
  • 53
  • 369
  • 502