bsdiff without compression is generating big delta patch files

Question

I am currently developing an OTA/FOTA update system which must run in an embedded device with a ARM CORTEX M0+. My main problem is the FLASH lack of space and the network low network bandwidth I have, so the delta patches should be the smaller the better.

In order to get this result I did some research and found some binary diff algorithms and tools such as bsdiff, xdelta or courgette. My problem with all of them was the size, because I need to have a very small compiled application for running this, so I got a bsdiff standalone version (actually they were 2 versions: bsdiff standalone and minibsdiff):

https://github.com/Cheedoong/bsdiff

https://github.com/thoughtpolice/minibsdiff

The first one still uses bzip2 but is standalone and most suitable for an embedded system, but I wanted to test 2 things:

How is the size of the uncompressed delta. So I tried to remove all the bzip2 logic, getting that. I was very surprised when I noticed that the size of the delta was similar to the size of the complete original file, so I jumped to the second source, the minibsdiff.
The minibsdiff is the bsdiff but without any compression at all, letting you use whatever compression you want. It served me also to check that I was not wrong and that the generated uncompressed delta patch was the same size (or a bit more, because the header and others I suppose) than the original file I wanted to patch.

So... What is happening here? I read googling a bit that very similar files generate bigger patches but... during the tests I used 8 KB size files, getting 8KB uncompressed patches is not a solution, because then maybe it would be better only compress the file and substitute the old one by the new one.. I feel I am missing something.

Any idea will be very appreciated.

Thank you all.

Best regards,

Iván.

Not sure what your actual question is. You already described "what is happening here". But if you diff, you have to merg on the target with the actual contents before flashing. Anyway, without further details it is hard to provide a useful answer. You might have to think first a bit longer. — too honest for this site, Aug 04 '16 at 15:05
Hello Olaf. My question is why this is happening, because I pretend to get small deltas not ones with the same size than the original file. I am trying to study the suffix sorting that bsdiff implements because maybe is not suitable for a 250K or less files. And yes, if I diff I have to merge, but the diff file is huge and it should be small, at least smaller than the original. If I don't go further is because I pretend not to develop my own algorithm but using one already in use. Thanks anyway. — Fulgor3, Aug 04 '16 at 15:31
There are various ways to encode diffs between files. Each has its application. It all depends on what you expect. If your expectation is not fullfilled, you can blow the concept. There is no simple answer to this and the question - even after all information has been given - would be too broad. Make some test, look at the files, possibly define your own format/tools. If you cannot figure out and/or lack the experience, hire a consultant, that is nothing to be ashamed for. Anyway, stack overflow is not a consulting site. — too honest for this site, Aug 04 '16 at 15:39
That is why I researched before doing any move. It seems as if your were thinking I am asking something here without doing any research before. I need to generate deltas from a binary, so bsdiff should be suitable here, the problem is that maybe the people don't check the size of the generated delta before bzip2 compresses it. So regarding to this point I don't know much about those algorithms I expected that here maybe someone could give me a clue about if I am missing something using a suffix sorting algorithm instead VCDIFF for example (used by xdelta). But bsdiff should be the best option. — Fulgor3, Aug 05 '16 at 06:34
Well, I'm no clairvoyant. Can only judge from the question and that tells me you don't. But maybe you just need more experience (no offence). That's nothing bad, but this is the wrong site for this. A forum would be better. — too honest for this site, Aug 05 '16 at 10:59
I know that Olaf. Maybe I need more experience, it doesn't offence me, I only try to learn and to reach the goal. I only posted with the info I have, nothing more, not pretending to offend. I hoped someone had used this and could advise me. Besides I know it is not a trivial topic. Anyway, thanks, and sorry if I disturbed you. Regards. — Fulgor3, Aug 05 '16 at 11:52
What you are trying to do sounds way too complicated for such a small system and seems to be a good example of the [XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). Maybe you should use a very simple compression like LZ4, maybe you should add an external flash. But you didn't tell us anything about the firmware size, flash size(s), bandwidth, how much downtime is acceptable for the update. — starblue, Aug 11 '16 at 09:12
Hi starblue. I found also suitable, more or less at leas, LZW as compreesion tool. I have a 256kB FLASH memory for storing a bootloader and the application FW. The FW is 150kB right now, but this size could be a bit larger soon, but I hope not much more. The network's bandwith is 250 kbpbs (theoretical maximum), the downtime AS I want to store the DELTA compressed in the node before updating, it shouldn't be down until it had received all the patch. I know it is complicated but I haven't found any other way of performing an OTA firmware update for a contrained embedded device (CORTEX M0+). — Fulgor3, Aug 23 '16 at 11:35
@Fulgor3 I cannot add an answer as this question is closed, so adding a brief comment here. BSDIFF's original output, which you perceive as delta image and expect to be smaller, is almost of the same size as the original files. You need to understand the structure of the output, which contains triplets (diff data size, extra data size and src adjust), followed by diff data sections and extra data sections. 1/2 — Zeeshan, Mar 28 '17 at 10:35
@Fulgor3 (cont.) The diff data is the literal difference between src and tgt data and it mostly contain 0s. This is the way BSDIFF output is defined and how it works. We need to COMPRESS this output to reduce the size and then we get smaller delta image. Without compression we cannot get the smaller delta image. So, compression and decompression is important part for your embedded FOTA solution if you are going to use BSDIFF. 2/2 — Zeeshan, Mar 28 '17 at 10:35

score 3 · Answer 1 · answered Aug 05 '16 at 08:45

I took some conclusions reading more about bsdiff and how it sorts the suffixes. It seems to add zeros between locations, that is why it seems to increase the patch size, but the zeros are easily compressed and that is why bsdiff is efficient. So if I want to implement this in a system with very little memory available it would be recommendable to use another compression algorithm, such as lzw for example, and modify the patcher in order to patch (write in FLASH the fw) in blocks as I am decompressing blocks, because I cannot handle in an ARM CORTEX M0+ a big file ( 32KB RAM and 8 or 16KB ROM for the compressed patch).

Best regards and I will post more if I get any interesting result.

Thanks to everyone.

Iván.

score 1 · Answer 2 · answered Aug 04 '16 at 09:45

1

If you are computing diffs between 2 compressed disk images, any small difference close to the beginning of the image will cause the images to be almost completely different, generating a long diff.

You could compute the diff between the uncompressed versions, and compress that for transmission, but the embedded routine would need to have enough RAM to merge the diff into an uncompressed copy of the flash image and recompress that for flashing.

answered Aug 04 '16 at 09:45

chqrlie

131,814
10
121
189

Hello. I am computing the diffs between 2 uncompressed images, because I wanted to test first not compressing the deltas. Later on I compared the deltas using notepad++ and HexDif, and they aren't completely different, that is why I was really surprised when I notice the patch were the same size as the original file. Another question would be if I can apply the patch (just apply, because generation would be carried on a Desktop PC) by parts in the embedded with 32KB RAM. Best regards – Fulgor3 Aug 04 '16 at 11:30
Can you try and compress the image files? If the compressed size is close to the uncompressed size, it means the images are already compressed. computing a delta between compressed files usually produces a large file. – chqrlie Aug 04 '16 at 16:04
Hello chqrlie. I tried this before, and no, they weren't already compressed. 8KB ( like the original file size) uncompressed patch, and 1.8KB the compressed patch is what I get, so it seems as if only the compression is being useful :(. Maybe in the minibsdiff project the missed something but I got similar results modifiying the bsdiff standalone (without the bszip2 in usr/bin) version. Thanks for your ideas!!!. Have you ever implemented one of those binary diff algorithms with a successful result? I am just curious. – Fulgor3 Aug 05 '16 at 06:38

bsdiff without compression is generating big delta patch files

2 Answers2