I'm running into issues when trying to process mail with procmail and Python. I am using syntax something like this:
:0
...[Filter] | (python3 script.py) >> file.txt
as procmail syntax. My Python script extracts the mail from stdin, converts MIME to unicode and outputs it to a file as follows:
def main():
dataset = Data()
indata = (Parser().parse(sys.stdin)).as_string()
indata = (quopri.decodestring(indata)).decode('utf-8')
arrayofstrings = indata.split("\n")
for line in arrayofstrings:
[write some data to <dataset>]
filename = "outfile.txt"
file = open(filename, "w")
file.write(dataset.toString())
Data() is a structure that stores a series of unicode strings and toString() concatenates them.
If I run this script in bash with a stored mail like this:
cat test.txt | python3 script.py
it correctly writes the data as unicode to the file.
However, if I get a mail and it gets processed, procmail writes the following error to the log:
UnicodeEncodeError: 'ascii' codec can't encode character '\xdf' in position 83: ordinal not in range(128)
If I change the last line of the python script to:
file.write(dataset.toString().encode('utf-8'))
I get the correctly encoded string in the file. I want it in unicode though.