0

I need to write a regular expression in Python that will capture some text which could possibly include any special character (like !@#$%^). Is there a character class similar to [\w] or [\d] that will capture any special character?

I could write down all the special characters in my regex but it would end up looking unreadable. Any help appreciated.

kronosjt
  • 703
  • 4
  • 10
  • 24

2 Answers2

0

Special letter characters

Python 3

If you're using Python3, you might not have to do anything. \w already includes many "special characters" :

>>> import re
>>> re.findall('\w', 'üäößéÅßêèiìí')
['ü', 'ä', 'ö', 'ß', 'é', 'Å', 'ß', 'ê', 'è', 'i', 'ì', 'í']

Python 2.7

In Python2.7, only i would be matched by default \w :

>>> import re
>>> re.findall('\w', 'üäößéÅßêèiìí')
['i']

You could use re.UNICODE :

# encoding: utf-8
import re
any_char = re.compile('\w', re.UNICODE)
re.findall(any_char, u'üäößéÅßêèiìí')
# [u'\xfc', u'\xe4', u'\xf6', u'\xdf', u'\xe9', u'\xc5', u'\xdf', u'\xea', u'\xe8', u'i', u'\xec', u'\xed']
for x in re.findall(any_char, u'üäößéÅßêèiìí'):
    print x
#   ü
#   ä
#   ö
#   ß
#   é
#   Å
#   ß
#   ê
#   è
#   i
#   ì
#   í

Any special character

Specifying unicode ranges might simplify your regex. As an example, this regex match any unicode arrow :

>>> import re
>>> arrows = re.compile(r'[\u2190-\u21FF]')
>>> re.findall(arrows, "a⇸b⇙c↺d↣e↝f")
['⇸', '⇙', '↺', '↣', '↝']

For Python2, you'd need to specify unicode string and regex :

>>> import re
>>> arrows = re.compile(ur'[\u2190-\u21FF]')
>>> re.findall(arrows, u"a⇸b⇙c↺d↣e↝f")
[u'\u21f8', u'\u21d9', u'\u21ba', u'\u21a3', u'\u219d']
Community
  • 1
  • 1
Eric Duminil
  • 52,989
  • 9
  • 71
  • 124
  • This answer assumes that "special character" means "special letter character", right? For example, the special characters ``»«›‹`` are not matched by this: ``re.findall("\w", "»«›‹")`` returns ``[]``. – Schmuddi Mar 22 '17 at 13:22
  • @Schmuddi: You're right. I updated the answer with unicode ranges, which might shorten the regex a lot. – Eric Duminil Mar 22 '17 at 13:33
  • Sorry I should have made it clear that I was looking for characters like !@#$% and not special letter characters. Updated OP. – kronosjt Mar 23 '17 at 05:45
0

You can try using the negated versions (\W, \D) which match any non-word or non-digit characters.

Anonymous
  • 26
  • 5