I have a text file that I'm trying to view in notepad++. The file contains a list of reptile subspecies names in one column, and then I have DNA sequence IDs that are separated by what I thought was just white space. But when I open the text file, it appears that the space is occupied with unprintable characters such as GS
and VT
.
For example:
Subspecies name, unprintable characters, sequence ID, unprintable char, sequence ID... and so forth until the next line:
Ablepharus bivittatus GSGSGSGSGS
36660VT
4560VT
23400
Is there a way I can remove all of these unprintable GS
and VT
characters from my text file? When I try to print every line in the file, I keep getting weird spacing due to these control characters that I see in notepad++. Any way I can make it just print normally without all of the spacing disruptions from the unprintable characters?
Updates:
I used user312016's advice and installed chardet
for Python. I found out the file is encoded in UTF-16LE.
I got the file from a professor off of dropbox. The file was zipped, and all I did was unzip the file. It was a .txt
file and I'm sure he didn't mention using another script to parse the data.
When I click on the unzipped .txt
file to open it in regular notepad, it displays weird symbols that I assume are the GS
s and VT
s that I see when I open the file in notepad++.