How to remove hex characters in Python

Question

I am trying the following

>>> import string
>>> s = 'https://google.com\n<0x03><0x03><0x03>'
>>> s.decode('utf8').encode('ascii', errors='ignore')

The expected output is:

'https://google.com'

But the hex characters and new line is not removed.

does this answer your query: https://stackoverflow.com/questions/36598136/remove-all-hex-characters-from-string-in-python? — Krishna Chaurasia, Jan 28 '21 at 12:37
There are no non-ascii characters in your original input `'https://google.com\n<0x03><0x03><0x03>'` Edit: to clarify `\n` is valid ascii, `<0x03>` are just a series of six ascii characters and aren't raw bytes, also `\x03` is valid ascii — lvrf, Jan 28 '21 at 16:56
why do you expect it will remove `\n` or other chars ? ASCII chars are probably from code 0 to 128 - so `03` is ASCII code. If you don't want text after '\n` then use `s = s.split('\n')[0]` — furas, Jan 28 '21 at 17:03

score 0 · Answer 1 · answered Jan 28 '21 at 12:47

0

This code:

import string
import re
s = 'https://google.com\n<0x03><0x03><0x03>'
s=re.sub(r'[^ -~].*'.format(string.punctuation), '',s)
print(s)

gives this:

'https://google.com'

answered Jan 28 '21 at 12:47

Younes

You can find here a list of the characters the regex operator tild **~** works on [~ operator](https://catonmat.net/my-favorite-regex) – Younes Jan 28 '21 at 12:53

1 Answers1