I was doing cmp -l file.bin file2.bin
but started to get cmp: EOF on file and suspected Windows/Linux line-ending problem as described here.
I should be splitting binary data by some sequence so I did some profiling.
I noticed that this problem is about only some of my files which are of size 1GB.
Output from od -c file.bin
0435500 \n <A5> \n Y \n f \n p \n <A9> \n A \n W \n 202
0435520 \n <B0> \n M \n t \n 202 \n <B1> \n i \n i \n 227
0435540 \n 221 \n Y \n ; \n <B2> \n 225 \n < \n J \n 217
0435560 \n <A9> \n < \n 211 \n <AB> \n 201 \n T \n y \n 204
0435600 \n 212 \n \ \n v \n p \n | \n 9 \n M \n u
0435620 \n 214 \n < \n r \n <A0> \n <AF> \n X \n W \n 204
0435640 \n <A5> \n B \n a \n 207 \n <AA> \n S \n ^ \n |
0435660 004 \r \n > 003 <ED> 003 <E8> \f . 003 <EC> \f * 004 032
0435700 \f h \f m \f i \f h \n o 004 024 \n k \n <A5>
0435720 \n <A2> \n = \n k \n p \n <B1> \n I \n ^ \n y
0435740 \n 227 \n < \n T \n | \n 224 \n 8 \n w \n 202
where you see one output of \r \n
in line 0435660.
In total, 11 matches on 11 lines
when total lines of 60 characters is 0571520.
So there seems to be Windows line-endings 0.001% of file content which is significantly smaller than in normal cases.
Only, the minority of files have this problem, and the original data sources do not.
This suggests me that this is a problem in data processing.
Is this enough confirmation that those endings are Windows line-endings?
My files contains events which should have fixed length each.
So I am not sure how well dos2unix
will work here, since I cannot change the length of the event.
I think I need to remove those events which have Windows line-endings or replace the windows EOF \r\n
by \0\n
.
However, I am not sure if I can do this by adding the literal string into the content without changing the length of some events.
The situation is if I change the length of some events, the system stops working.
How to work with Windows/Unix EOF warnings in binary data?