-1

Have source file txt (download from accounting program) with 0a in line, when it's not needed (it makes line break). And have 0d and 0a in the place when it's needed. I need to open it in Excel ( I have another opportunity to download it in csv) When I download almost the same data in xml I encounter the same problem when getting data with python, but I've solved it by

for i in range(1,16):
                    lstFile.append(str(file))
                    lstAmount.append(str(amount))
                    lstKey.append(str(keys[i-1]))
                    if accPay.find(keys[i-1]) is None:
                        lstValue.append("none")
                    else:
                        lstValue.append(accPay.find(keys[i-1]).text.replace(u'\u000d',' '))
                    

enter image description here

enter image description here

But I can't replace 0a separately.

when I write

    with open(file, 'r') as file :
  filedata = file.read()
filedata = filedata.replace(u'\u000a', ' ')
with open('Konten_last5.txt', 'w') as file:
  file.write(filedata)

I get all 0a and 0d 0a replaced by 20 (space).

enter image description here

When I write

    with open(file, 'r') as file :
  filedata = file.read()
filedata = filedata.replace(u'\u000d', ' ')
with open('Konten_last5.txt', 'w') as file:
  file.write(filedata)

I get everywhere 0d 0a in both places enter image description here please help))

I tried to replace separately (u'\u000d\u000a', 'any') but it doesn't work, this combination isn't found.

Tried solution, but it doesn't work.. couldn't attach picture in comment enter image description here

Elina
  • 23
  • 6
  • Apply _negative Lookbehind_. `import re; x='A\u000aB\u000d\u000aC\u000aD'; x; re.sub("(?<!\u000d)\u000a", ' ', x)` returns `'A\nB\r\nC\nD'` and `'A B\r\nC D'`. Please [edit] your question to share a [mcve] - how do you get your data (I'd guess that you read a `csv` file)? – JosefZ Dec 28 '21 at 20:54
  • Sorry haven't fully understood you( how can I change my code? I download txt. When I download csv I have the same problem in Excel. When I get all the data I need from the xml files I have the same problem,but for i in range(1,16): if accPay.find(keys[i-1]) is None: lstValue.append("none") else: lstValue.append(accPay.find(keys[i-1]).text.replace(u'\u000d',' ')) helps me – Elina Dec 28 '21 at 23:30
  • I almost caught your idea but not realisation)))) I need NOT to replace when there is 0a with 0d...? – Elina Dec 28 '21 at 23:44
  • Sounds like it is an issue of LF (Line Feed) VS CRLF (Carriage Return Line Feed). The former normally used a line break in *nix and latter in Windows. The txt file somehow has them mixed together. You can try some online converter, for simplicity's sake - https://app.execeratics.com/LFandCRLFonline/?l=en Alternatively, you can do it yourself using JosefZ's solution. It is almost a one-liner. Just need to tip your toes in regex (useful skill) – edd Dec 28 '21 at 23:57
  • Tried your solution, but got the same result – Elina Dec 29 '21 at 00:08
  • I see it's like unified symbol, will try solution in another question and open it in binary mode – Elina Dec 29 '21 at 00:11

1 Answers1

1

Open the file in binary mode (open(file,'rb')). Then, you can read and deal with byte strings, instead of mucking with Unicode translations. That's especially important on Windows, where writing a '\x0a' to a text file results in the file system writing '\x0d\x0a'.

Tim Roberts
  • 48,973
  • 4
  • 21
  • 30
  • thank you! the method worked with xml, I haven't understood there's any difference in opening, thanx! – Elina Dec 29 '21 at 00:23
  • Traceback (most recent call last): File "C:\Program Files\Sublime Text 3\replace.py", line 13, in filedata = filedata.replace('\x0a', '\x0a') TypeError: a bytes-like object is required, not 'str' I can't use replace? – Elina Dec 29 '21 at 00:28
  • it's ok filedata = filedata.replace(b'\x0a', b'\x00') that's worked – Elina Dec 29 '21 at 00:36