Basically, the ASCII table takes value in range [0, 27) and associates them to (writable or not) characters. So, to ignore non-ASCII characters, you just have to ignore characters whose code isn't comprise in [0, 27), aka inferior or equal to 127.
In python, there is a function, called ord
, which accordingly to the docstring
Return the integer ordinal of a one-character string.
In other words, it gives you the code of a character. Now, you must ignore all characters that, passed to ord
, return a value greater than 128. This can be done by:
with open(filename, 'rb') as fobj:
text = fobj.read().decode('utf-16-le')
out_file = open("text.txt", "w")
# Check every single character of `text`
for character in text:
# If it's an ascii character
if ord(character) < 128:
out_file.write(character)
out_file.close
Now, if you just want to conserve printable characters, you must notice that all of them - in the ASCII table at least - are between 32 (space) and 126 (tilde), so you must simply do:
if 32 <= ord(character) <= 126: