0

I want to write an encoded text to a file using Python 3.6, the issue is that I want to write it as a string and not as bytes.

text = open(file, 'r').read()
enc = text.encode(encoding)  # for example: "utf-32"
f = open(new_file, 'w')
f.write(str(enc)[2:-1])
f.close()

The problem is, I still get the file content as bytes (e.g. the '\n' remains the same instead of become a new row).

I also tried to use:

enc.decode(encoding)

but it's just returning me back the old text I had in the first place.

any ideas how can I improve this piece of code?

Thanks.

kiyah
  • 1,502
  • 2
  • 18
  • 27
Y. Serrouya
  • 41
  • 1
  • 10
  • You can open the file like `open(file, 'r', encoding="your_encoding")` then while you apply `read()` the result will be a binary object you can decode it using `decode("utf32")` – Chiheb Nexus Feb 19 '18 at 10:59
  • When I open the file as mentioned, I cannot apply either `read()` or `decode(encoding)` – Y. Serrouya Feb 19 '18 at 11:06

1 Answers1

0

The problem you have here is that you encode into utf-32 bytes object and then cast it back into a string object without specifying an encoding. The default is utf-8, so you've just converted using the wrong encoding. If you pass the same encoding to str then it should work.

Better yet, don't call str at all when writing out - if you already have a bytes object, it's not necessary.

This concept generally trips up a lot of people. I suggest reading the explanation here to help wrap your head around how and why we do the string/bytes conversions. A good rule of thumb - string types inside your python, and decode to string from bytes as data comes in, encode from string to bytes as it goes out.

jimjkelly
  • 1,631
  • 14
  • 15