2

I have obtained a data.txt file from an online source. When I open the file with Notepad, I see the random characters as shown in the figure.

Screen shot

I attempted to open the file using the following python code snippet:

my_file = 'data.txt'
f = open(my_file, 'rb')
print(f)
ff = pickle.load(f)
print(ff)
f.close()

The first print operation gives <_io.BufferedReader name='data.txt'>in the console. And the second print operation displays all the data of data.txt file in a readable form.

I want to edit the data in the data.txt file with my own data sets. I googled for possible solutions. Most of the available solutions (for example this) suggest changing the Encoding scheme of the data.txt file to UTF-8. At present, the data.txt Encoding is ANSI. I changed the Encoding to UTF-8 as suggested. However, the problem still persists (file still contains gibberish). Moreover, I tried to see the contents of the file (now UTF-8 encoding) using the above python code snippet. This time, I get the following error.

_pickle.UnpicklingError: invalid load key, '\xef'.

The python code shows that the file has valid data. However, I'm unable to edit the data with my own data sets. Any help, please!

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
santobedi
  • 866
  • 3
  • 17
  • 39
  • 1
    if `pickle.load()` works correctly, then why can you not just operate on the loaded data and re-pickle it? – SiHa Jul 03 '20 at 06:12
  • @SiHa I want to edit the data in Notepad. I used the python code only to verify whether the file contains any valid data or not. – santobedi Jul 03 '20 at 06:15
  • 1
    The file "looks garbled" because it is *not actually a text file*. Just because you name a file ending in `.txt` doesn't mean it will contain something intended to be viewed or edited as plain text, or that will make sense when you attempt to do so. I assume there is some *reason* you knew to use `pickle` to handle the file; that reason is the important thing, not the filename extension. You cannot reasonably expect to edit the contents of a file produced using the `pickle` module using Notepad in the same way that you cannot reasonably expect to edit the contents of a JPG image using Notepad. – Karl Knechtel Jul 03 '20 at 06:20
  • 3
    The *purpose* of `pickle` is to transform *actual, in-program Python objects* into a form that can be written into a file, and then read the file back and re-create matching objects. This process is called *serialization* (and deserialization, when you read it back). The way you "edit" data that *represents objects in a program* is to... manipulate the objects. – Karl Knechtel Jul 03 '20 at 06:23
  • The fact that you were expected to use `'rb'` for the flags when opening the file should have been the first hint; things intended to be interpreted as text could be opened for reading with `'r'`, and the results would be friendlier (for example, you could iterate over the lines of text automatically using a `for` loop). – Karl Knechtel Jul 03 '20 at 06:25

1 Answers1

1

The error:

_pickle.UnpicklingError: invalid load key, '\xef'.

means that the load key:\xef isn't plain text. This could be an image, music file, etc. If the contents of the .txt file is not plain text there is no way to convert the characters to text.

Seaver Olson
  • 450
  • 3
  • 16