Or, perhaps a better title: how to avoid unwanted extra carriage return when passing binary file to text mode write clause.
Python 3.6, Windows. Input file needs to undergo first a binary search/replace, and then a regex search/replace.
I first open the input file in binary mode, do the work, and save it in binary mode in a temporary file. Then I open that in text mode, do the regex search/replace, and save it in text mode (with a name resembling that of the input file).
def fixbin(infile):
with open(infile, 'rb') as f:
file = f.read()
# a few bytearray operations here, then:
with open('bin.tmp', 'wb') as f:
f.write(file)
def fix4801(fname, ext):
outfile = '{}_OK{}'.format(fname, ext)
with open('bin.tmp', encoding='utf-8-sig', mode='r') as f, \
open(outfile, encoding='utf-8-sig', mode='w') as g:
infile = f.read()
x = re.sub(r'(\n4801.+\n)4801', r'\1 ', infile)
g.write(y)
infile, fname, ext = get_infile() # function get_infile not shown for brevity
fixbin(infile)
fix4801(fname, ext)
It works but it's ugly. I'd rather pass outputs as files, like so:
def fixbin(infile):
with open(infile, 'rb') as f:
file = f.read()
# a few bytearray operations here, and then
return file.decode('utf-8')
def fix4801(infile):
x = re.sub(r'(\n4801.+\n)4801', r'\1 ', infile)
return x
...
temp = fixbin(infile)
result = fix4801(temp)
outfile = '{}_OK{}'.format(fname, ext)
with open(outfile, encoding='utf-8-sig', mode='w') as g:
g.write(result)
But then the output file (Windows) gets an unwanted extra carriage return. The symptoms are described here, but the cause is different: I'm not using os.linesep
, in other words there is no os.linesep in my code. (there may be in the underlying libraries, I haven't checked.)
What am I doing wrong?