30

I'm using this code from a previously asked question a few years ago, however, I believe this is outdated. Trying to run the code, I receive the error above. I'm still a novice in Python, so I could not get much clarification from similar questions. Does anyone know why this is happening?

import subprocess

def getLength(filename):
  result = subprocess.Popen(["ffprobe", filename],
    stdout = subprocess.PIPE, stderr = subprocess.STDOUT)
  return [x for x in result.stdout.readlines() if "Duration" in x]

print(getLength('bell.mp4'))

Traceback

Traceback (most recent call last):
  File "B:\Program Files\ffmpeg\bin\test3.py", line 7, in <module>
    print(getLength('bell.mp4'))
  File "B:\Program Files\ffmpeg\bin\test3.py", line 6, in getLength
    return [x for x in result.stdout.readlines() if "Duration" in x]
  File "B:\Program Files\ffmpeg\bin\test3.py", line 6, in <listcomp>
    return [x for x in result.stdout.readlines() if "Duration" in x]
TypeError: a bytes-like object is required, not 'str'
Georgy
  • 12,464
  • 7
  • 65
  • 73
chatbottest
  • 403
  • 1
  • 4
  • 9

2 Answers2

62

subprocess returns bytes objects for stdout or stderr streams by default. That means you also need to use bytes objects in operations against these objects. "Duration" in x uses str object. Use a bytes literal (note the b prefix):

return [x for x in result.stdout.readlines() if b"Duration" in x]

or decode your data first, if you know the encoding used (usually, the locale default, but you could set LC_ALL or more specific locale environment variables for the subprocess):

return [x for x in result.stdout.read().decode(encoding).splitlines(True)
        if "Duration" in x]

The alternative is to tell subprocess.Popen() to decode the data to Unicode strings by setting the encoding argument to a suitable codec:

result = subprocess.Popen(
    ["ffprobe", filename],
    stdout=subprocess.PIPE, stderr = subprocess.STDOUT,
    encoding='utf8'
)

If you set text=True (Python 3.7 and up, in previous versions this version is called universal_newlines) you also enable decoding, using your system default codec, the same one that is used for open() calls. In this mode, the pipes are line buffered by default.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Maybe point out the `universal_newlines=True` aka `text=True` in Python 3.7+ which causes Python to decode the output as text in the system's default encoding and return a string. – tripleee Apr 01 '19 at 17:42
  • 1
    @tripleee: added. – Martijn Pieters Apr 01 '19 at 19:20
  • The encoding argument of Popen is available from Python 3.6, in previous version (Python 3.5 in my case), you must precise the encoding when doing byte conversion (`bytes("Duration", encoding='utf8')`) – adn05 Apr 05 '19 at 14:19
5

Like the errror says, "Duration" is a string. Whereas, the X is a byte like object as results.stdout.readlines() reads the lines in the output as bytecode and not string.

Hence store "Duration" in a variable, say str_var and encode it into a byte array object using str_var.encode('utf-8').

Refer to [this][1].

[1] : Best way to convert string to bytes in Python 3?

Harshith Thota
  • 856
  • 8
  • 20
  • It's just a literal, just prefix it with `b`. You don't need to store the string in a variable to be able to encode it either, `"Duration".encode('utf-8')` works too (but is a waste of computer cycles if you can just make it a bytes object to begin with). – Martijn Pieters Jul 08 '17 at 19:17
  • Well, if he wants to use it for multiple files, it's better to store it in a variable. Now, mind explaining why a downvote for that? – Harshith Thota Jul 08 '17 at 19:19
  • Why? A string literal is stored as a constant with the code object anyway, and where are they mentioning multiple files? – Martijn Pieters Jul 08 '17 at 19:20
  • Note that the test is done in a loop, using a literal is *better there* because that loads a constant, rather than having to look up a variable each time. – Martijn Pieters Jul 08 '17 at 19:20
  • Fair enough but still doesn't explain the downvote. It's not a wrong answer. – Harshith Thota Jul 08 '17 at 19:22