I am trying to parse a gzipped csv file (where the fields are separated by | characters), to test if reading the file directly in Python will be faster than zcat file.gz | python
in parsing the contents.
I have the following code:
#!/usr/bin/python3
import gzip
if __name__ == "__main__":
total=0
count=0
f=gzip.open('SmallData.DAT.gz', 'r')
for line in f.readlines():
split_line = line.split('|')
total += int(split_line[52])
count += 1
print(count, " :: ", total)
But I get the following error:
$ ./PyZip.py
Traceback (most recent call last):
File "./PyZip.py", line 11, in <module>
split_line = line.split('|')
TypeError: a bytes-like object is required, not 'str'
How can I modify this to read the line and split it properly?
I'm interested mainly in just the 52nd field as delimited by |. The lines in my input file are like the following:
field1|field2|field3|...field52|field53
Is there a faster way than what I have in summing all the values in the 52nd field?
Thanks!