So I'm attempting to chain a couple scripts together, some in powershell (5.1), some in python (3.7).
The script that I am having trouble with is written in python, and writes to stdout via sys.stdout.write(). This script reads in a file, completes some processing, and then outputs the result.
When this script is called by itself, that is to say no output to any pipe, it properly executes and writes to the standard powershell console. However, as soon as I attempt to pipe the output in any fashion I start to get errors.
In particular, two files have the character \u200b, or a zero-width-space. Printing the output of these characters to the console is fine, but attempting to redirect the output to a file via a variety of methods:
py ./script.py input.txt > output.txt
py ./script.py input.txt | Set-Content -Encoding utf8 output.txt
Start-Process powershell -RedirectStandardOutput "output.txt" -Argumentlist "py", "./script.py", "input.txt"
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'
all fail with:
File "\Python\Python37\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u200b' in position 61: character maps to <undefined>
On the python side, modifying the script to remove all non-UTF-8 characters also causes this script to fail, so I am a bit stuck. I am currently thinking that the issue is occurring due to how the piped output is causing python to set a different environment, though I am not sure how such modifications could be made within the python code.
For completeness sake, here is the function writing the output. (Note: file_lines is a list of strings):
import sys
def write_lines(file_lines):
for line in file_lines:
line = list(map(lambda x: '"' + x + '"', line))
line = "".join(entry + ',' for entry in line)
if not line is None:
sys.stdout.write(line + "\n")