1

I want to iterate through some files within a folder and send the path of the files to a list. Then I want to pass this list to a subprocess to execute a bash command:

procfiles = []
os.chdir("/path/to/directory")

for root, dirs, files in os.walk('.'):
    for file in files:
        if '.mp3' in file:
            filename = os.path.join(root, file)
            print(filename)
            procfiles.append(filename)
print(procfiles)

args = [command, with, arguments].extend(procfiles)
process = subprocess.Popen(args, shell=False)
output, error = process.communicate()

But I get the following output when the file contains a german umlauts letter. For example: ä, ö or ü

./titleWith ä or ü - artist with ü.mp3                                         #print(filename)
['./titleWith \udcc3\udca4 or \udcc3\udcbc - artist with \udcc3\udcbc.mp3']    #print(procfiles)

This means that there is something wrong with the encoding during the procfiles.append(filename) process, right?

After that the subprocess fails with the error:

UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc3' in position 43: surrogates not allowed

Infos:

  • Python 3.5.3
  • OS: Debian Jessie
  • Kernel: 4.9.58+
  • architecture: armhf

UPDATE:

I just noticed that when I am executing it manually with the user root or www-data it works, but when I execute it via my custom php script (its only a shell_exec('/usr/bin/python3 /path/to/script.py >> /path/to/log.log 2>&1')) it doesn't work.

Shouldn't that be the same as when I execute it from the user www-data manually? Or do I have some other environment variables set when the python script is executed from a php script?

jones1008
  • 157
  • 1
  • 15

3 Answers3

0

This is exactly expected behavior, although in your case the file system encoding is wrong so it outputs surrogate escapes to correctly re-encode your string. The backslash escapes are just the exact representation of the string. If you wanted to correctly print the characters (although this depends on the encoding of your sys.stdout and of your terminal) then call print() on every string. Seems like subprocess doesn't pass errors=surrogateescape to str.encode().

kb1000
  • 320
  • 1
  • 10
0

If I run this script:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import os
import subprocess

procfiles = []
os.chdir("/home/dede/tmp/")

for root, dirs, files in os.walk('.'):
    for file in files:
        if '.mp3' in file:
            filename = os.path.join(root, file)
            print(filename)
            procfiles.append(filename)
print(procfiles)

args=["ls", "-la"]
args.extend(procfiles)
process = subprocess.Popen(args, shell=False)
output, error = process.communicate()

I get this output:

dede@i5:~> python3 tst.py 
./Leere Datei.mp3
./Kopie ä  von Leere Datei.mp3
['./Leere Datei.mp3', './Kopie ä  von Leere Datei.mp3']
-rw-r--r-- 1 dede users 6 31. Mär 16:50 ./Kopie ä  von Leere Datei.mp3
-rw-r--r-- 1 dede users 6 31. Mär 16:50 ./Leere Datei.mp3

So the wrong part must be somewhere else in your code....

...or your mp3's have their Umlaute in Windows-Encoding.

dede
  • 706
  • 9
  • 19
  • I just noticed that when I am executing it manually with the user root or www-data it works, but when I execute it via my custom php script (its only a `shell_exec('/usr/bin/python3 /path/to/script.py >> /path/to/log.log 2>&1')`) it doesn't work. Shouldn't that be the same as when I execute it from the user www-data manually? Or do I have some other environment variables set when the python script is executed from a php script? – jones1008 Apr 01 '18 at 18:09
0

Python3.5

Convert your strings first:

procfiles = [s.encode('utf-8', errors='surrogateescape').decode('utf-8')
             for s in procfiles]

Python 3.6

You can specify with errors='surrogateescape' that this error is ignored:

process = subprocess.Popen(args, shell=False, errors='surrogateescape')
Mike Müller
  • 82,630
  • 20
  • 166
  • 161
  • with `errors='surrogateescape'` I get the error: `TypeError: __init__() got an unexpected keyword argument 'errors'` I guess you meant that argument from decode or encode as described [here](https://stackoverflow.com/a/21116263/7987318). Do I have to de or encode it with that before I pass it to the subprocess? – jones1008 Apr 01 '18 at 17:23
  • Ups. I used Python 3.6, which has this keyword argument. Python 3.5 doesn't have it. – Mike Müller Apr 01 '18 at 18:49
  • Updated my answer. – Mike Müller Apr 01 '18 at 18:56