I have strings like this:
ꐊ,ꀵ,\u0f6e,ⴗ,ꦚ,\u2d75,ꢯ,⾌,\ua97d,⩱,ㇴ,\u2d6e,鼺,\x00Ꞁ
and I want to filter out all these invalid characters beginning with a slash, which I am trying to do with regex in Python.
It does work like this:
re.sub(r",\u0f6e,", r",deleted,", s)
But not like this:
re.sub(r",\.{5},", r",deleted,", s)
It should work according to http://pythex.org, so I guess it's because they are invalid characters? How can I match them?
Edit: @metatoaster said my question is ambiguous:
The problem seems to arise because the input string s
is not a raw string.
>>> s = ' ꐊ,ꀵ,\u0f6e,ⴗ,ꦚ,\u2d75,ꢯ,⾌,\ua97d,⩱,ㇴ,\u2d6e,鼺,\x00Ꞁ'
>>> re.sub(r",\u0f6e,", r",deleted,", s)
' ꐊ,ꀵ,deleted,ⴗ,ꦚ,\u2d75,ꢯ,⾌,\ua97d,⩱,ㇴ,\u2d6e,鼺,\x00Ꞁ'