4

Or, perhaps a better title: how to avoid unwanted extra carriage return when passing binary file to text mode write clause.

Python 3.6, Windows. Input file needs to undergo first a binary search/replace, and then a regex search/replace.

I first open the input file in binary mode, do the work, and save it in binary mode in a temporary file. Then I open that in text mode, do the regex search/replace, and save it in text mode (with a name resembling that of the input file).

def fixbin(infile): 
    with open(infile, 'rb') as f:
        file = f.read()

    # a few bytearray operations here, then: 
    with open('bin.tmp', 'wb') as f: 
        f.write(file)

def fix4801(fname, ext): 
    outfile = '{}_OK{}'.format(fname, ext)    
    with open('bin.tmp', encoding='utf-8-sig', mode='r') as f, \
         open(outfile, encoding='utf-8-sig', mode='w') as g: 
        infile = f.read()
        x = re.sub(r'(\n4801.+\n)4801', r'\1    ', infile)
        g.write(y)

infile, fname, ext = get_infile() # function get_infile not shown for brevity
fixbin(infile)
fix4801(fname, ext)

It works but it's ugly. I'd rather pass outputs as files, like so:

def fixbin(infile): 
    with open(infile, 'rb') as f:
        file = f.read()
    # a few bytearray operations here, and then
    return file.decode('utf-8')

def fix4801(infile): 
    x = re.sub(r'(\n4801.+\n)4801', r'\1    ', infile)
    return x

...
temp = fixbin(infile)
result = fix4801(temp)

outfile = '{}_OK{}'.format(fname, ext)
with open(outfile, encoding='utf-8-sig', mode='w') as g:
    g.write(result)

But then the output file (Windows) gets an unwanted extra carriage return. The symptoms are described here, but the cause is different: I'm not using os.linesep, in other words there is no os.linesep in my code. (there may be in the underlying libraries, I haven't checked.)

What am I doing wrong?

Community
  • 1
  • 1
RolfBly
  • 3,612
  • 5
  • 32
  • 46

1 Answers1

4

Python » Documentation : open

open(file, mode='r', buffering=-1, encoding=None, errors=None, 
           newline=None, closefd=True, opener=None)  

Default: newline=None, If newline is '' or '\n', no translation takes place.
Try the following if it makes any different:

#change
    open(outfile, encoding='utf-8-sig', mode='w') as g:
#with
    open(outfile, encoding='utf-8-sig', mode='w', newline='') as g:

Question: ... there is no os.linesep in my code.


Python » Documentation : open
When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.

stovfl
  • 14,998
  • 7
  • 24
  • 51