0

With python 3.8 I am trying to get a list of git logs using subprocess, and then to process them. Here is a piece of the code:

lines = subprocess.check_output(
    [
        "git",
        "log",
        "--all",
        "--no-merges",
        "--shortstat",
        "--reverse",
        "--pretty=format:'%ad;%an'",
    ]
).splitlines()

for line in lines:
    print(line)
    if "file changed" in line or "files changed" in line:
        print("do something")

But I get an error:

....
if "file changed" in line or "files changed" in line:
TypeError: a bytes-like object is required, not 'str'

and the line print gives some output like

b"'Tue Nov 9 10:18:31 2010 +0000;someuser'"

That object looks like a byte object (see the 'b'?). Converting that object to a str and printing it, I get:

print(str(line))
b"'Tue Nov 9 10:18:31 2010 +0000;someuser'"

But I do not want to work with byte objects, I want to work with simple strings! So I am not sure what is going on. And why there are two quotes in that line.

How to convert the lines I get from the splitlines to normal strings I can compare with in if statements ad given in the code example?

Alex
  • 41,580
  • 88
  • 260
  • 469
  • 2
    try `if 'x' in str(line)` so your line is converted to a string – Chris Dec 17 '21 at 16:07
  • When I convert `line` to a `str`, I still get the 'b' in front when I `print(str(line))`. That seems very strange. I get the EXACT same output! Why do I still get that 'b'? – Alex Dec 17 '21 at 16:10
  • 2
    https://stackoverflow.com/questions/17615414/how-to-convert-binary-string-to-normal-string-in-python3 – Chris Dec 17 '21 at 16:11
  • So `splitlines` returns a list of encoded objects, which I have to decode? Is that correct? – Alex Dec 17 '21 at 16:12
  • 1
    it's is a binary string then splitlines() will return a list of binary strings. if it's not, it will return regular strings. `subprocess.check_output()` is returning a binary string. – Chris Dec 17 '21 at 16:14
  • I had an older code where I "converted" the output from `subprocess.check_output()` to a string(?) using `str`. Now it does not seem to work anymore. My old code was something like: `lines = str(subprocess.check_output()).split()`. Why does `str` not wor with the example in my question? – Alex Dec 17 '21 at 16:18
  • Ok maybe In understand. I splitted the WHOLE thing, which contained only one 'b'. Now I have 'b's for every line. I also will NEVER understand this bytestring/string encoding stuff. I give up. – Alex Dec 17 '21 at 16:24
  • 1
    just do `subprocess.check_output(.....).decode().splitlines()` - The output from subprocess has quotes, so when you print the string you see double quotes. – Chris Dec 17 '21 at 16:25
  • Thanks a lot, maybe I understand – Alex Dec 17 '21 at 16:26
  • In the linked duplicate https://stackoverflow.com/questions/53279764/python-unicodedecodeerror-how-to-correctly-read-unicode-strings-from-subproces, see in particularly the answer by tripleee, instead of the accepted one. – Charles Duffy Dec 17 '21 at 16:32
  • The "whole" thing didn't _contain_ any `b`. `b'123'` is one three-character bytestring; the only things it contains are `b'1'`, `b'2'`, and `b'3'`, each of which are single-character bytestrings; the `b`s in the front is how `repr()` shows that they are in fact bytestrings instead of unicode strings, not part of the content. Bytestrings are made of individual bytes, just like unicode strings are made up of (potentially-multibyte) characters. – Charles Duffy Dec 17 '21 at 16:32

0 Answers0