Short answer
Set enviornment variable PYTHONIOENCODING
, and set the encoding in Popen
:
#tst1.py
import subprocess
import sys, os
#print(sys.stdout.encoding) #output: utf-8 this default for interactive console
os.environ['PYTHONIOENCODING'] = 'utf-8'
p = subprocess.Popen(['python', 'tst2.py'], encoding='utf-8', stdout=subprocess.PIPE, stderr=subprocess.PIPE)
#print(p.stdout) #output: <_io.TextIOWrapper name=3 encoding='utf-8'>
#print(p.stdout.encoding, ' ', p.stderr.encoding) #ouput: utf-8 utf-8
outs, errors = p.communicate()
print(outs, errors)
where tst1.py
, runs another python script tst2.py
, like:
#tst2.py
import sys
print(sys.stdout.encoding) #output: utf-8
print('\u2e85') #a chinese char
Long Answer
Using PIPE
, indicates that a pipe to the standard stream should be opened. A pipe, is a unidirectional data channel that can be used for interprocess communication. Pipes deal with binary, and are agnostic to the encoding. Applications on each side of the pipe should have consensus on the text encoding , if it is text (read more).
So firstly, stdout
of tst2.py
should have utf-8 encoding, otherwise it raises error:
UnicodeEncodeError: 'charmap' codec can't encode character '\u2e85' in position 0: character maps to <undefined>
The streams sys.stdout
and sys.stderr
are regular text files like those returned by the open()
function. On Windows, non-character devices such as pipes and disk files use the system locale encoding (i.e. an ANSI codepage like CP1252
). Under all platforms, you can override the character encoding by setting the PYTHONIOENCODING environment variable before running the interpreter.
Secondly, tst1.py
should know how to read from pipe, thus the encoding='utf-8'
in Popen
.
More Details
With python 3.6+, following PEP 528, the default encoding of the interactive console in Windows is utf-8 (it can be changed by setting both PYTHONIOENCODING
and PYTHONLEGACYWINDOWSSTDIO
). But this does not apply to pipes and redirecting.