Regular Expression - character class for special characters

Question

I need to write a regular expression in Python that will capture some text which could possibly include any special character (like !@#$%^). Is there a character class similar to [\w] or [\d] that will capture any special character?

I could write down all the special characters in my regex but it would end up looking unreadable. Any help appreciated.

Have you checked [this](http://stackoverflow.com/questions/18057962/regex-pattern-including-all-special-characters)? — Sangbok Lee, Mar 22 '17 at 13:08

score 0 · Answer 1 · edited May 23 '17 at 12:10

Special letter characters

Python 3

If you're using Python3, you might not have to do anything. \w already includes many "special characters" :

>>> import re
>>> re.findall('\w', 'üäößéÅßêèiìí')
['ü', 'ä', 'ö', 'ß', 'é', 'Å', 'ß', 'ê', 'è', 'i', 'ì', 'í']

Python 2.7

In Python2.7, only i would be matched by default \w :

>>> import re
>>> re.findall('\w', 'üäößéÅßêèiìí')
['i']

You could use re.UNICODE :

# encoding: utf-8
import re
any_char = re.compile('\w', re.UNICODE)
re.findall(any_char, u'üäößéÅßêèiìí')
# [u'\xfc', u'\xe4', u'\xf6', u'\xdf', u'\xe9', u'\xc5', u'\xdf', u'\xea', u'\xe8', u'i', u'\xec', u'\xed']
for x in re.findall(any_char, u'üäößéÅßêèiìí'):
    print x
#   ü
#   ä
#   ö
#   ß
#   é
#   Å
#   ß
#   ê
#   è
#   i
#   ì
#   í

Any special character

Specifying unicode ranges might simplify your regex. As an example, this regex match any unicode arrow :

>>> import re
>>> arrows = re.compile(r'[\u2190-\u21FF]')
>>> re.findall(arrows, "a⇸b⇙c↺d↣e↝f")
['⇸', '⇙', '↺', '↣', '↝']

For Python2, you'd need to specify unicode string and regex :

>>> import re
>>> arrows = re.compile(ur'[\u2190-\u21FF]')
>>> re.findall(arrows, u"a⇸b⇙c↺d↣e↝f")
[u'\u21f8', u'\u21d9', u'\u21ba', u'\u21a3', u'\u219d']

This answer assumes that "special character" means "special letter character", right? For example, the special characters ``»«›‹`` are not matched by this: ``re.findall("\w", "»«›‹")`` returns ``[]``. — Schmuddi, Mar 22 '17 at 13:22
@Schmuddi: You're right. I updated the answer with unicode ranges, which might shorten the regex a lot. — Eric Duminil, Mar 22 '17 at 13:33
Sorry I should have made it clear that I was looking for characters like !@#$% and not special letter characters. Updated OP. — kronosjt, Mar 23 '17 at 05:45

score 0 · Answer 2 · answered Mar 22 '17 at 13:14

0

You can try using the negated versions (\W, \D) which match any non-word or non-digit characters.

answered Mar 22 '17 at 13:14

Anonymous

26
5

Regular Expression - character class for special characters

2 Answers2

Special letter characters

Python 3

Python 2.7

Any special character