I have a file containing this header FIELD1 FIELD2 : 0x30070040
and a lot of junk characters (half the file's size). To get rid of all of them I execute these commands:
dos2unix -q -n file
sed -i $'s/[^[:print:]\t]//g' file #Removing every non-printable character (yes, dos2unix was not enough)
But then I end up having a file containing this odd header. If I copy and paste it from shell it looks like this:
PFcount_01032019.txt0000777017777601777760116201541013436157760015052 0ustar nfsnobodynfsnobody▒▒FIELD1 FIELD2 : 0x30070040
If I copy and paste from a text editor like VIM it looks like this:
PFcount_01032019.txt0000777017777601777760116201541013436157760015052 0ustar nfsnobodynfsnobodyÿþFIELD1 FIELD2 : 0x30070040
Note the two special characters just before FIELD1.
Now I would like to end up with an header like this:
FIELD1 FIELD2
It is important to keep everything that is between FIELD1
and FIELD2
too because that is the fields separator of the file.
I thought about using this:
sed -i -r '1 s/.+(FIELD1.+) : 0x.+/\1/g' file
But apparently .+FIELD1
does not match with PFcount_01032019.txt0000777017777601777760116201541013436157760015052 0ustar nfsnobodynfsnobody▒▒FIELD1
or PFcount_01032019.txt0000777017777601777760116201541013436157760015052 0ustar nfsnobodynfsnobodyÿþFIELD1
(whichever it is the true one), so I can't extract \1
from the regex.
Shouldn't .
match every character? Why it does not match with whatever come before FIELD1
?