I wrote a simple script in colab to pull text files from my drive, put them into a string, run them through a function, and print them out. Some text files are saved as ANSI and the text comes out fine. Some text files were saved as unicode and there is a black diamond question mark after every single character. How can I get rid of these? I have tried errors = 'ignore' as well, and a few other things. But I'm thinking I'm missing something fundamental about character encoding.
os.chdir('/content/drive/My Drive')
for file in glob.glob("*.txt"):
with open(file, 'r', encoding = 'utf-8', errors='replace') as file:
Text = file.read()
print(my_function(Text))