19

So, I'm having an issue. I'm catching some stuff from a Logger, And the output looks something like this:

11:41:19 [INFO] ←[35;1m[Server] hi←[m

I need to know how to remove those pesky ASCII color codes (or to parse them).

yoozer8
  • 7,361
  • 7
  • 58
  • 93
Nathan F.
  • 3,250
  • 3
  • 35
  • 69

2 Answers2

45

If they're intact, they should consist of ESC (U+001B) plus [ plus a semicolon-separated list of numbers, plus m. (See https://stackoverflow.com/a/9943250/978917.) In that case, you can remove them by writing:

final String msgWithoutColorCodes =
    msgWithColorCodes.replaceAll("\u001B\\[[;\\d]*m", "");

. . . or you can take advantage of them by using less -r when examining your logs. :-)

(Note: this is specific to color codes. If you also find other ANSI escape sequences, you'll want to generalize that a bit. I think a fairly general regex would be \u001B\\[[;\\d]*[ -/]*[@-~]. You may find http://en.wikipedia.org/wiki/ANSI_escape_code to be helpful.)

If the sequences are not intact — that is, if they've been mangled in some way — then you'll have to investigate and figure out exactly what mangling has happened.

Community
  • 1
  • 1
ruakh
  • 175,680
  • 26
  • 273
  • 307
  • 1
    I feel like the question and this answer are highly underrated. –  May 20 '19 at 17:10
  • It works! But it is unclear to me how the second one is more general than the first one: where is the terminal `m` captured, in the second REGEXP? – Olivier Cailloux Apr 12 '20 at 21:57
  • 1
    @OlivierCailloux: The `m` is matched by the `[@-~]`. – ruakh Apr 13 '20 at 00:17
  • Indeed, this is what I observe. But where is this syntax documented, do you have some reference documentation about it, or is this an undocumented feature of reg exps in Java? Is it a range from `@` to `~`? What does that mean? I can’t find a precise definition of range covering that case in the [API javadoc](https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/util/regex/Pattern.html). – Olivier Cailloux Apr 14 '20 at 07:48
  • 1
    @OlivierCailloux: Yes, it's a range, matching any character from U+0040 (`@`) to U+007E (`~`). It's true that the Javadoc doesn't really explain ranges. I guess it's leaning on its last paragraph: "For a more precise description of the behavior of regular expression constructs, please see [*Mastering Regular Expressions, 3nd Edition,* Jeffrey E. F. Friedl, O'Reilly and Associates, 2006](http://www.oreilly.com/catalog/regex3/)." – ruakh Apr 14 '20 at 15:06
-2

How about this regex

replaceAll("\\d{1,2}(;\\d{1,2})?", "");

Based on the format found here: http://bluesock.org/~willg/dev/ansi.html

DangerDan
  • 519
  • 2
  • 13