-1

I have a gz file sample.gz.

This is first line of sample gz file.
This is second line of sample gz file.

I read this .gz file and then split it line by line. Once I have individual lines I further split it into parts with whitespace as separator.

import gzip
logfile = "sample.gz"
with gzip.open(logfile) as page:
    for line in page:
        string = line.split(" ")
        print(*string, sep = ',')

I am expecting output like

This,is,first,line,of,sample,gz,file.
This,is,second,line,of,sample,gz,file.

But insted of the above result, I am receiving TypeError:

TypeError: a bytes-like object is required, not 'str'

Why is the split function not working as it is supposed to?

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
data-bite
  • 417
  • 2
  • 5
  • 17
  • 1
    You have a comma instead of a dot in `string = line,split(" ")` – Gameplay Dec 21 '22 at 09:57
  • Because, as the error message tells you, you are passing a string, not a bytes-like object to it. – mkrieger1 Dec 21 '22 at 09:57
  • 1
    Does this answer your question? [Python read csv line from gzipped file](https://stackoverflow.com/questions/51624803/python-read-csv-line-from-gzipped-file) – mkrieger1 Dec 21 '22 at 10:00
  • 1
    https://stackoverflow.com/a/50829888/15923186 Here's a direct answer to your question – Gameplay Dec 21 '22 at 10:01
  • Does this answer your question? [Cannot Split, A bytes-like object is required, not 'str'](https://stackoverflow.com/questions/50829364/cannot-split-a-bytes-like-object-is-required-not-str) – Gameplay Dec 21 '22 at 10:02
  • @Gameplay no it is not comma, it is a dot. The stackoverflow.com/a/50829888/15923186 did got rid of the error, but then it appends the 'b' for every item in the list. Before i could try the 'Cannot Split, A bytes-like object is required, not 'str'' i tried my luck with link shared by mkrieger1. Thank you very much for looking into this and suggesting those solutions. I did learn a couple of things. Thank you. – data-bite Dec 22 '22 at 05:45
  • @mkrieger1 Thank you very much for sharing that link. I got exactly what i was looking for. – data-bite Dec 22 '22 at 05:46

2 Answers2

2

By default, gzip.open opens files in binary mode. This means that reading returns bytes objects, and bytes objects can only be split on other bytes objects, not on strings.

If you want strings, use the mode and encoding arguments to gzip.open:

with gzip.open(logfile, 'rt', encoding='utf-8') as page:
    ...
Ture Pålsson
  • 6,088
  • 2
  • 12
  • 15
0

If you guys see the comments above there are couple of approaches which can be used. I followed Python read csv line from gzipped file suggested by mkrieger1 and came up with below solution.

import gzip
logfile = "sample.gz"
with gzip.open(logfile) as page:
    for line in page:
        string = line.decode('utf-8').split(' ')
        print(*string, sep = ',')

Thanks for quick response here.

data-bite
  • 417
  • 2
  • 5
  • 17