-1

I have large txt-files which happen to have the combination CRLFCRLF as end-of-line. I have to change this to CRLF to be able to work with this file. Text-Editor Replace and Text-Editor Makros take too long, because the file is 8 GB. How can I do it with Python 2.7? I tried the following, but it does not change the file. When I try with a keyboardable-string, e.g. replace('a','A'), or replace('BUS','CAR'), it works:

f1 = open('C:/temp/Textfile1.txt', 'r')
f2 = open('C:/temp/Textfile2.txt', 'w')
string = f1.read()
string = string.replace('\r\n\r\n','\r\n')
f2.write(string)
f1.close()
f2.close()
Cut7er
  • 1,209
  • 9
  • 24
  • Does `string = f1.read()`work? this loads the whole file in memory. Youre probably better off reading line by line, instead all by one. – Patrick Artner Sep 10 '18 at 16:28
  • Yes it works, I have 32 GB RAM. Also I already tried line-by-line reading, this also worked for non-special-strings, but not for /r/n. Also I figured: How can CRLFCRLF ever be found in one line? Because the first CRLF belongs to the first line, the second CRLF belongs to the second line (which contains nothing, but this CRLF), so if I look at each line seperately, CRLFCRLF can never be found. – progoodstuff Sep 11 '18 at 15:04
  • windows or unix/linux system? – Patrick Artner Sep 11 '18 at 16:01

1 Answers1

0

Try it using regex:

fn = "t.txt"
fn2= "r.txt"

print '-'*70
with open(fn,"w") as f:
    f.write("ta\r\ntata\r\n\r\ntata\r\n\r\n\r\nta\r\ntaa\r\n\r\n\r\n\r\ntata")

with open(fn,"r") as f:
    print(f.read())

import re
with open(fn,"r") as f:
    t = f.read()

subbed = re.sub(r"\r\n\r\n", r"\r\n", t)
with open(fn2,"w") as f:
    f.write(subbed)

print '-'*70
with open(fn2,"r") as f:
    print(f.read())

Output:

----------------------------------------------------------------------
ta
tata

tata


ta
taa



tata
----------------------------------------------------------------------
ta
tata
tata

ta
taa

tata

Sidenote:

If on linux, use subbed = re.sub(r"\n\n", r"\n", t)

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69