-1

I work on a program in python which analyze files and keep only what I want in these files. I have an error when I open some files. These files contains string and bytes like that :

file.py:
if byte == "0xFD":
    for byte in bytes.split["0xFD"]:
...

When I open that type of files, the strings present between quotes are interpreted as bytes and that makes the program crashed : UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 59752: character maps to <undefined>. Same error with 'utf-8'.

So my question is: how can I read that line without interpreted the byte (I want to keep the line like that)?

Jason Aller
  • 3,541
  • 28
  • 38
  • 38

2 Answers2

0

If you add rb in your open function, it will read the bytes as UTF-8.

file = open("insertfilenamehere.txt", "rb")
txt = file.read()
Glatinis
  • 337
  • 1
  • 13
0

I found a way to do what I want : with open(file_path, "r", encoding="utf-8", errors='replace') as file:

I found some help on this post : Unicode error handling with Python 3's readlines()

Dharman
  • 30,962
  • 25
  • 85
  • 135