I have a python script that's trying to interpret a trace of data written to and read from stdout and stdin, respectively. The problem is that this data is riddled with ANSI escapes I don't care about. These escapes are JSON encoded, so they look like "\033[A" and "\033]0;". I don't actually need to interpret the codes, but I do need to know how many characters are included in each (you'll notice the first sequence is 6 characters while the second is 7). Is there a straightforward way to filter out these codes from the strings I have?
Asked
Active
Viewed 1.2k times
10
-
The `colcrt` program already does this. It's not in Python, but if that's a requirement, it could be ported or wrapped. – tripleee Nov 22 '12 at 05:01
6 Answers
15
The complete regexp for Control Sequences (aka ANSI Escape Sequences) is
/(\x9B|\x1B\[)[0-?]*[ -\/]*[@-~]/
Refer to ECMA-48 Section 5.4 and ANSI escape code

Jeff
- 2,095
- 25
- 18
-
2you should not copy your answers from one question to another. – Jean-François Fabre Feb 16 '19 at 09:13
10
Another variant:
def strip_ansi_codes(s):
"""
>>> import blessings
>>> term = blessings.Terminal()
>>> foo = 'hidden'+term.clear_bol+'foo'+term.color(5)+'bar'+term.color(255)+'baz'
>>> repr(strip_ansi_codes(foo))
u'hiddenfoobarbaz'
"""
return re.sub(r'\x1b\[([0-9,A-Z]{1,2}(;[0-9]{1,2})?(;[0-9]{3})?)?[m|K]?', '', s)

boxed
- 3,895
- 2
- 24
- 26
3
#!/usr/bin/env python
import re
ansi_pattern = '\033\[((?:\d|;)*)([a-zA-Z])'
ansi_eng = re.compile(ansi_pattern)
def strip_escape(string=''):
lastend = 0
matches = []
newstring = str(string)
for match in ansi_eng.finditer(string):
start = match.start()
end = match.end()
matches.append(match)
matches.reverse()
for match in matches:
start = match.start()
end = match.end()
string = string[0:start] + string[end:]
return string
if __name__ == '__main__':
import sys
import os
lname = sys.argv[-1]
fname = os.path.basename(__file__)
if lname != fname:
with open(lname, 'r') as fd:
for line in fd.readlines():
print strip_escape(line).rstrip()
else:
USAGE = '%s <filename>' % fname
print USAGE

Brian Bruggeman
- 5,008
- 2
- 36
- 55
1
It's far from perfect, but this regex may get you somwhere:
import re
text = r'begin \033[A middle \033]0; end'
print re.sub(r'\\[0-9]+(\[|\])[0-9]*;?[A-Z]?', '', text)
It already removes your two examples correctly.

BoppreH
- 8,014
- 4
- 34
- 71
0
FWIW, this Python regex seemed to work for me. I don't actually know if it's accurate, but empirically it seems to work:
r'\\033[\[\]]([0-9]{1,2}([;@][0-9]{0,2})*)*[mKP]?'

rivenmyst137
- 345
- 1
- 4
- 10