0

I have a file that contains some PCL sequences. I have this sequence at the end of the file (hex):

461b 2670 3158 0a    F.&p1X.

I want to remove the sequence: <Esc>&p1X including the character that follows. In 99% of cases, LF follows the sequence.

I tried this command:

sed -b 's/\o33&p[0-9]X$//Mg' ~/test.txt >test2.txt

However, it appends LF at the end of test2.txt. Also, if, instead of $ I specify . it doesn't match the line anymore.

If you want to play with this, generate the input file using this command:

echo -e "SomeString\033&p1X" > ~/test.txt

The redirect appends an LF char at the end.

Thanks

boggy
  • 3,674
  • 3
  • 33
  • 56

2 Answers2

1

If I have understood well you know for sure that your file contains that sequence of characters at the end. If this is the case I would simply truncate the last six bytes. It will work regardless the very last character being new-line or whatever you want...

Example:

$ echo -e "SomeString\033&p1X" > test.txt
$ od -c test.txt
0000000   S   o   m   e   S   t   r   i   n   g 033   &   p   1   X  \n
0000020
$ truncate -s -6 test.txt 
$ od -c test.txt 
0000000   S   o   m   e   S   t   r   i   n   g
0000012

This is also very efficient as it will use the system call truncate().

mauro
  • 5,730
  • 2
  • 26
  • 25
  • Some files might not contain this sequence. That's why truncating would not work unfortunately. – boggy Jan 12 '16 at 08:05
0

This seems to do the trick based on this thread:

perl -pi -e 's/\x1b&p[0-9]X\n//g' ~/test.txt

(I am a perl beginner as well - any comments would be appreciated).

boggy
  • 3,674
  • 3
  • 33
  • 56
  • What are the [other] non-newline chars that follow the `X`? And what's the intent? It looks like the file is mostly hex and you want to strip the non-hex line trailer? Do you always want to strip the [optional] newline? If you're interested in perl, I've been writing it for 20+ years and I'd be happy to post an answer with a solution and some additional tips. If you can post a data file that shows all the variants (e.g. 5-10 lines) that would also help – Craig Estey Jan 12 '16 at 01:51
  • @CraigEstey what do you mean with "It looks like the file is mostly hex..."? Files are neither hex nor non-hex. Files just contain bytes you can represent in hex format or not. – mauro Jan 12 '16 at 05:23
  • @mauro Well, a "hex dump file" [or just "hex file"] means that the file contains the output of a hex dumper program (e.g. `od`, `xxd`, etc.). I missed the "(hex)" and so the sample looked like a hex line (sans the alpha) and the trailer was some extra stuff that needed to be stripped to be able to read in the hex values on the left. If I were posting the problem, I would have shown some of the valid left side data as that can sometimes influence what a regex can be [vs a hex dump with alpha for _just_ the bad part] – Craig Estey Jan 12 '16 at 06:45