-2

I have a text file that contains \n as new line in it.

In Python 3.6, when I load it using the following code:

file = open(file_name, 'r')
contents = file.read()

it changes all \n to \\n. For example:

Original in the txt file:

This is a test \n plus senond \n test.

After reading in Python:

"This is a test \\n plus senond \\n test."

I need to keep all the \n to work as new line and do much more analysis on them (using reg ex).

What is the correct method to read the file and solve this issue?

mah65
  • 578
  • 10
  • 20
  • 6
    Because if you can see `\n` in the text file, that is a `\ ` and a `n`, not the newline `\n` char. If you want to make them newlines as well do `replace("\\n", "\n")` – azro Aug 30 '20 at 09:10
  • What you maybe also be seeing is the `repr` representation of the input string - if you were to write it out or `print(contents)` it should render the new line. – metatoaster Aug 30 '20 at 09:11
  • "\n" is an Escape character,due to there are string "\n" in your file, it will be escaped to "\\n" – Kevin Mayo Aug 30 '20 at 09:12

2 Answers2

0

All actual newline characters (linefeed / LF, hex value 0x0A) are preserved by default when reading a file in Python. But your file seems to contain escape sequences, which you want to convert to actual, single newline characters.

In this case, just use: print(contents.replace("\\n", "\n"))

shredEngineer
  • 424
  • 4
  • 9
  • I applied this modification and it solved my problem: contents = file.read().replace("\\n", "\n"). Thanks – mah65 Aug 30 '20 at 09:26
0

Where do you get the double backslash output? I just tested this on my own, and both printing the read contents from the file, and writing it back to another file, seem to just preserve one set of ...

Code:

file = open("test.txt", 'r')
contents = file.read()
print(contents)
file.close()
file2 = open("test2.txt", "w")
file2.write(contents)
file2.close()

The encodings of both the input file containing

This is a test \n plus senond \n test.

and the output file (resulting in exactly the same) are UTF-8 in my case. Maybe that has something to do with this? Just speculation

I can't replicate your issue but as shredEngineer said, you could just fix it manually with a simple replace. Would be interesting to know why your code adds a second backslash though...

Matheos
  • 207
  • 2
  • 12