0
import re

import string

a= """  Message-ID: <13505866.1075863688222.JavaMail.evans@thyme>
Date: Mon, 23 Oct 2000 06:13:00 -0700 (PDT)
From: phillip.allen@enron.com
To: randall.gay@enron.com
Subject: 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-From: Phillip K Allen
X-To: Randall L Gay
X-cc: 
X-bcc: 
X-Folder: \Phillip_Allen_Dec2000\Notes Folders\'sent mail
X-Origin: Allen-P
X-FileName: pallen.nsf

Randy,

 Can you send me a schedule of the salary and level of everyone in the 
scheduling group.  Plus your thoughts on any changes that need to be made.  
(Patti S for example)

Phillip

""" <br>
s=re.sub('[\\\]+', ' yy', a)
print(s)

error message:unicodeescape' decode can't decode bytes in position 354-355:malformed error image\N character space

I've already tried using different combinations of backslashes but its still showing the same error

Atul
  • 1
  • 2

2 Answers2

1

To encode a literal backslash in a regex, you need four backlashes in a normal string (or two backslashes in a raw string), not three:

s = re.sub('\\\\+', ' yy', a)

or

s = re.sub(r'\\+', ' yy', a)

You don't need a character class for a single character (although it doesn't hurt much, either).

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • It bothers me to no end, but lots of people use character classes to escape meta-characters. Of course this won't work with backslashes which are meta-character both inside and outside character classes. – Aaron Jun 15 '18 at 10:04
  • I've already used this before but still couldn't get the desired result. It's still giving the same error. – Atul Jun 15 '18 at 10:54
0

The problem that leads to your error message (which occurs during compile time, long before the also faulty regex is constructed (see my other answer)) is in this line:

X-Folder: \Phillip_Allen_Dec2000\Notes Folders\'sent mail

Here you have a \N that Python tries to interpret as an escape sequence like "\N{GREEK CAPITAL LETTER DELTA}", and of course fails doing so.

You need two backslashes to correct that problem.

X-Folder: \\Phillip_Allen_Dec2000\\Notes Folders\\'sent mail
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • I understand the conflict that compiler faces at that particular position and it can be handled manually by substituting '\\' in place of '\' but I've been working on a project where I need to analyze thousands of emails in that particular format. So now I can't manually edit every email and hence if you could suggest a better way for this. Although thanks for considering my question and replying back. – Atul Jun 16 '18 at 14:08
  • The problem only occurs if you paste the email into the source code of your script. If you read it from a file, the backslashes will be recognized as such. – Tim Pietzcker Jun 16 '18 at 16:25