1

I'm merging three three files (ls -l):

-rw-rw-r-- 1 kacper kacper 1839510 sie 13 14:27 A.jpg
-rw-rw-r-- 1 kacper kacper 2014809 sie 13 14:27 B.jpg
-rw-rw-r-- 1 kacper kacper 1277047 sie 13 14:27 C.pdf

into one file (merged) in bash using:

cat A.jpg >> merged 
echo $SEPARATOR >> merged 
cat B.jpg >> merged 
echo $SEPARATOR >> merged 
cat C.pdf >> merged

where:

SEPARATOR=PO56WLH82SN1ZS5QH5EU9FOZVLBRLHAGHO3D5KOUSPMS6KYSFAYN2DBL

Next I'm splitting the merged file into three parts using:

csplit --suppress-matched merged --prefix="PART_" '/'$SEPARATOR'/' {*}

this produces PART_00, PART_01, PART_02 (ls -l):

-rw-rw-r--  1 kacper kacper 1839398 sie 13 18:41 PART_00
-rw-rw-r--  1 kacper kacper 2014507 sie 13 18:41 PART_01
-rw-rw-r--  1 kacper kacper 1277047 sie 13 18:41 PART_02

PART_00 and PART_01 are JPG files and can be properly displayed. PART_02 is a PDF file and it can be opened and viewed. So, at first glance this looked to me like success.

The problem is that the size of PART_00 (1839398 bytes) is slightly smaller then A.jpg (1839510 bytes). The same goes for the other files (PART_01, B.jpg and PART_02, C.pdf). After checking the files byte by byte using

cmp

the pairs of files are exactly the same up to the point when one of them ends.

Anyone know why this is the case? Advice would be greatly appreciated.

kacper
  • 75
  • 4
  • 1
    Most likely and X-Y problem. Why are you doing this? Do you know about `tar` and friends? – karakfa Aug 13 '18 at 16:55
  • Hi karakfa. This will eventually be used to split the output stream of a process into separate chunks of data, tar might not be the tool for this job. – kacper Aug 13 '18 at 17:01

1 Answers1

1

The last lines in the files are not terminated by a newline character. As such, when you add your separator into the merged file you are adding it to the end of the last line in the files. This last line is then matched by csplit and the entire line dropped. Hence the last few characters are being dropped.

The --supress-matched option for csplit will supress the entire line matching where the pattern is matched.

borrible
  • 17,120
  • 7
  • 53
  • 75
  • Thanks borrible. Almost solved the problem by changing the echo lines to: echo -e "\n"$SEPARATOR >> merged Now PART_00 and PART_01 are exactly 1 byte larger then A.jpg and B.jpg. PART_02 is exactly the same size as C.pdf. The extra byte is 0a (hex). Any ideas on how to drop it? – kacper Aug 13 '18 at 17:07
  • Solved by 1) Changing echo lines to: echo -e "\n"$SEPARATOR >> merged 2) using truncate to drop last byte – kacper Aug 13 '18 at 17:22