I am a newbie in programming and have a question:
I try to edit some .vtt files, where I want to remove certain substrings from the text. The file should keep its structure. For this, I copied the .vtt files in the folder and changed it to a .txt ending. Now I run this simple code:
import os
file_index = 0
all_text = []
path = "/Users/username/Documents/programming/IMS/Translate/files/"
new_path = "/Users/username/Documents/programming/IMS/Translate/new_files/"
for filename in os.listdir(path):
if os.path.isfile(filename): #check if there is a file in the directory
with open(os.path.join(path, filename), 'r') as file: # open in read-only mode
for line in file.read().split("\n"): #read lines and split
line = " ".join(line.split())
start_index = line.find("[") #find the first character of string to remove, this returns the index number
last_index = start_index + 11 #define the last index to be removed
if start_index != -1:
line = line[:start_index] + line[last_index:] #The new line to slice the first charaters until the one to be removed, and add the others that need to stay
all_text.append(line)
else:
line = line[:]
all_text.append(line)'''
I get this error message:
> File "srt-files-strip.py", line 11, in <module>
> for line in file.read().split("\n"): #read lines and split File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/codecs.py", line 322, in decode
> (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position
> 3131: invalid start byte
I have search through different forums, changed to encoding="utf16", but to no avail. Strange thing is that it did work earlier on. Then I wrote a program to rename my files automatically, after that, it threw this error. I have cleared all files in the folder, copied the original ones in again ... can't get it to work. Would really appreciate your help, as I have really no idea where to look. Thx