2

I have an output file (namely a log from screen) containing several control characters. Inside the screen, I have programs running that use control characters to refresh certain lines (examples would be top or anything printing progress bars).

I would like to output a tail of this file using PHP. If I simply read in that file and echo its contents (either using PHP functions or through calling tail, the output is messy and much more than these last lines as it also includes things that have been overwritten. If I instead run tail in the command line, it returns just what I want because the terminal evaluates the control characters.

So my question is: Is there a way to evaluate the control characters, getting the output that a terminal would show me, in a way that I could then use elsewhere (e.g., write to a file)?

Alex
  • 405
  • 3
  • 12
  • The terminal is handling the control characters, not the shell. They don't have any meaning to a non-terminal. Does stripping them manually not do enough? – Etan Reisner Jan 12 '15 at 20:11
  • There may not be one pre-written and included in your distribution, but it shouldn't be too difficult to write a filter that reads a file byte-by-byte and omits outputting the stuff that isn't wanted. All you would need is a list of the different characters/sequences you want to eliminate... – twalberg Jan 12 '15 at 20:17
  • @twalberg: I think it's not really a matter of just eliminating them (PHP effectively does that by ignoring them). Say I ran a program printing its progress (0% done, 5% done, 10% done and so on, not writing new lines but rewriting that same line) then just stripping control characters will result in all these percentage values being in the output, whereas I only want the latest one to be there. – Alex Jan 12 '15 at 22:05
  • @EtanReisner: Thanks for the clarification, I will update my question accordingly. – Alex Jan 12 '15 at 22:05
  • You filter through something that understands terminal control characters. I'm not sure if any such things (other than terminals) exist though. – Etan Reisner Jan 12 '15 at 22:07
  • @Alex Well, that would involve a fair amount more than just filtering out the cursor motion and other control sequences. It would require essentially having a semantic understanding of what all the various sequences do as well. In other words a terminal, as suggested by @EtanReisner. Or maybe a clone of `screen` or `tmux`, but even those still rely on an underlying terminal to render things... – twalberg Jan 12 '15 at 22:20

2 Answers2

3

@5gon12eder's answer got rid of some control characters (thanks for that!) but it did not handle the carriage return part that was even more important to me.

I figured out that I could just delete anything from the beginning of a line to the last carriage return inside that line and simply keep everything after that, so here is my sed command accomplishing that:

sed 's/^.*\r\([^\r]\+\)\r\?$/\1\r/g'

The output can then be further cleaned using @5gon12eder's answer:

cat screenlog.0 | sed 's/^.*\r\([^\r]\+\)\r\?$/\1\r/g' | sed 's,\x1B\[[0-9?;]*[a-zA-Z],,g'

Combined, this looks exactly like I wanted.

Alex
  • 405
  • 3
  • 12
2

I'm not sure what you mean by “evaluating” the control characters but you could remove them easily.

Here is an example using sed but if you are already using PHP, its internal regex processing functionality seems more appropriate. The command

$ sed 's,\x1B\[[0-9?;]*[a-zA-Z],,g' file.dat

will dump the contents of file.dat to standard output with all ANSI escape sequences removed. (And I'm pretty sure that nothing else is removed except if your file contains invalid escape sequences in which case the operation is ill-defined anyway.)

Here is a little demo:

$ echo -e "This is\033[31m a \033[umessy \033[46mstring.\033[0m" > file.dat
$ cat file.dat
# The output of the above command is not shown to protect small children
# that might be browsing this site.
$ reset  # your terminal
$ sed 's,\x1B\[[0-9?;]*[a-zA-Z],,g' file.dat
This is a messy string.

The less program has some more advanced logic built in to selectively replace some escape sequences. Read the man page for the relevant options.

5gon12eder
  • 24,280
  • 5
  • 45
  • 92