Pandas 0.24 writes extra carriage return in gzip csv files

Question

Under Windows the standard EOL (end of line) terminator is a carriage return followed by a newline. When using the to_csv method on a dataframe that's what I get. However, when I use the to_csv method to write a gzip-compressed file I get two carriage returns in the file.

import pandas as pd, sys, gzip, zlib
print("python:", sys.version)
print("pandas:", pd.__version__)
print("zlib  :", zlib.ZLIB_RUNTIME_VERSION)
df=pd.DataFrame(data={'c0':['a','b'], 'c1':['c','d']})
print(df)
# Under Windows the EOL marker is \r\n, so this works as expected
df.to_csv('df.csv', index=None)
with open('df.csv', 'rb') as f:
    print("df.csv, default terminator   :", f.read())
# with gzip it writes \r\r\n as EOL, looks like a bug
df.to_csv('df.csv.gz', index=None)
with gzip.open('df.csv.gz', 'rb') as f:
    print("df.csv.gz, default terminator:", f.read())
# when specifying only a single '\n' that's what is written
df.to_csv('df.csv', index=None, line_terminator='\n')
with open('df.csv', 'rb') as f:
    print("df.csv, '\\n' terminator      :", f.read())
# when specifying only a single '\n' gzip it writes \r\n as EOL as desired
df.to_csv('df.csv.gz', index=None, line_terminator='\n')
with gzip.open('df.csv.gz', 'rb') as f:
    print("df.csv.gz, '\\n' terminator   :", f.read())

Here is the output:

python: 3.6.8 |Anaconda custom (64-bit)| (default, Dec 30 2018, 18:50:55) [MSC v.1915 64 bit (AMD64)]
pandas: 0.24.0
zlib  : 1.2.11
  c0 c1
0  a  c
1  b  d
df.csv, default terminator   : b'c0,c1\r\na,c\r\nb,d\r\n'
df.csv.gz, default terminator: b'c0,c1\r\r\na,c\r\r\nb,d\r\r\n'
df.csv, '\n' terminator      : b'c0,c1\na,c\nb,d\n'
df.csv.gz, '\n' terminator   : b'c0,c1\r\na,c\r\nb,d\r\n'

This clearly relates to a previously discussed issue at CSV in Python adding an extra carriage return, on Windows. My issue is that the behavior differs for compressed vs uncompressed files. Is this a known issue?

Pandas 0.24 writes extra carriage return in gzip csv files

0 Answers0