4

I am trying to use the LZMA SDK in an iPhone/iPad app, my starting point was the LZMA example project for iPhone provided by Mo Dejong, available here: https://github.com/jk/lzmaSDK Original was here: http://www.modejong.com/iOS/lzmaSDK.zip (I tried both and I get the same result from both).

The problem is that the extract uses as much RAM as the .7z contains uncompressed. In other words, say I have a 40MB compressed file, the uncompressed file is a binary sqlite DB that is about 250MB, it will slowly use up more and more memory as it uncompresses the file all the way up to 250MB. This will crash an iPad1 or anything before iPhone4 (256MB RAM). I have a feeling a lot of people will eventually run into this same problem, so a resolution now could help a lot of developers.

I originally created the .7z file on a PC using windows based 7-zip (latest version) and a 16MB dictionary size. It should only require 18MB of RAM to uncompress (and that is the case when testing on a PC looking at task manager). I also tried creating the archive using keka (the open source mac archiver), it did not resolve anything, although I can confirm that keka itself only uses 19MB of ram during its extract of the file on a mac which is what I would expect. I guess the next step would be to compare the source code of Keka to the source code of the LZMA SDK.

I played around with different dictionary sizes and other settings when creating the .7z file but nothing helped. I also tried splitting my single binary file into 24 smaller pieces before compressing, but that also did not help (still uses over 250MB of RAM to extract the 24 pieces).

Note that the ONLY change I made to the original code was to use a bigger .7z file. Also note that it does immediately free up the RAM as soon as the extract is finished, but that doesn't help. I feel like it is not freeing up RAM as it extracts like it should, or it is putting the entire contents into RAM until the very end when it is done and only then moving it out of RAM. Also, if I try to extract the same exact file using a mac app, while running instruments, I do not see the same behavior (StuffIt Expander for example maxed out at around 60MB of RAM while extracting the file, Keka, the open source mac archiver maxed out at 19MB of RAM).

I'm not much of a mac/xcode/objective-c developer (yet) so any help with this would be greatly appreciated. I could resort to using zip or rar instead, but I get far superior compression with LZMA so if at all possible I want to stick with this solution but obviously I need to get it to work without crashing.

Thanks!

Screenshot of Instruments.app profiling the example app

tradergordo
  • 57
  • 1
  • 7
  • Sorry Daren, I hadn't had the time until now to look into it any further. Hopefully I'll find the time this evening, but can't promise anything. – Jens Kohl Sep 25 '12 at 14:07
  • Just an FYI here, but the up to date github URL is https://github.com/mdejong/lzmaSDK – MoDJ Jul 02 '15 at 22:43

3 Answers3

1

Igor Pavlov, author of 7zip, emailed me, he basically said the observations I made in the original question are a known limitation of the c version of the SDK. The C++ version does not have this limitation. Actual quote:

"7-Zip uses another multithreaded decoder written in C++. That C++ .7z decoder doesn't need to allocate RAM block for whole solid block. Read also this thread:

http://sourceforge.net/projects/sevenzip/forums/forum/45797/topic/5655623 "

So until someone fixes the SDK for iOS, the workaround is to:

1) Decide what RAM limit you want to have for file decompression operations.

2) Any SINGLE file in your archive that exceeds limit from 1 above, must be split, you can do this using any binary spliter app such as splits: http://www.fourmilab.ch/splits/

3) After your files are ready, create the 7z file using the dictionary/block size options as described by MoDJ in his answer, for example with 24 meg limit: 7za a -mx=9 -md=24m -ms=24m CompressedFile.7z SourceFiles*

4) In your iOS app, after you decompress the files, determine what files had been split, and concatenate them back together again. The code for this is not all that complicated (I assume the naming convention that splits.exe uses, which is file.001, file.002, etc.)

    if(iParts>1)
    {
        //If this is a multipart binary split file, we must combine all of the parts before we can use it
        NSString *finalfilePath = whateveryourfinaldestinationfilenameis
        NSString *splitfilePath = [finalfilePath stringByAppendingString:@".001"];

        NSFileHandle *myHandle;
        NSFileManager *fileManager = [NSFileManager defaultManager];
        NSError *error;

        //If the target combined file exists already, remove it
        if ([fileManager fileExistsAtPath:finalfilePath]) 
        {
            BOOL success = [fileManager removeItemAtPath:finalfilePath error:&error];
            if (!success) NSLog(@"Error: %@", [error localizedDescription]);
        }

        myHandle  = [NSFileHandle fileHandleForUpdatingAtPath:splitfilePath];
        NSString *nextPart;
        //Concatenate each piece in order
        for (int i=2; i<=iParts; i++) {
            //Assumes fewer than 100 pieces
            if (i<10) nextPart = [splitfilePath stringByReplacingOccurrencesOfString:@".001" withString:[NSString stringWithFormat:@".00%d", i]];
            else nextPart = [splitfilePath stringByReplacingOccurrencesOfString:@".001" withString:[NSString stringWithFormat:@".0%d", i]];
            NSData *datapart = [[NSData alloc] initWithContentsOfFile:(NSString *)nextPart];
            [myHandle seekToEndOfFile];
            [myHandle writeData:datapart];
        }    
        [myHandle closeFile];
        //Rename concatenated file
        [fileManager moveItemAtPath:splitfilePath toPath:finalfilePath error:&error];
    }
tradergordo
  • 57
  • 1
  • 7
  • I tried to have a look at the C++ source code, but it seems to be Windows only code. I got some very basic stuff to compile under xcode, but the C++ version does not seems to be a good starting point because it is tied to a Windows APIs. The C version at least compiles and runs under UNIX/iOS even if it allocates way too much memory. – MoDJ Jan 25 '13 at 21:00
0

Okay, so this is a tricky one. The reason you are running into problems is because iOS does not have virtual memory while your desktop system does. The lzmaSDK library is written in such a way that it assumes your system has plenty of virtual memory for decompression. You will not see problems running on the desktop. Only when allocating large amounts of memory to decompress on iOS will you run into issues. It would be best to address this by rewriting the lzma SDK so that it makes better use of mapped memory directly, but that is not a trivial task. Here is how to work around the problem.

Using 7za

There are actually 2 command line options you will want to pass to the 7zip archive program in order to segment files into smaller chunks. I am going to suggest that you just use the 24 meg size that I ended up using since it was a decent space/mem tradeoff. Here is the command line that should do the trick, note that in this example I have big movie files named XYZ.flat and I want to compress then together in an archive.7z file:

7za a -mx=9 -md=24m -ms=24m Animations_9_24m_NOTSOLID.7z *.flat

If you compare this segmented file to a version that does not break the file into segments, you will see that the file gets a little bigger when segmented:

$ ls -la Animations_9_24m.7z Animations_9_24m_NOTSOLID.7z
-rw-r--r--  1 mo  staff  8743171 Sep 30 03:01 Animations_9_24m.7z
-rw-r--r--  1 mo  staff  9515686 Sep 30 03:21 Animations_9_24m_NOTSOLID.7z

So, segmenting reduces compression by about 800K, but it is not that big a loss because now the decompression routines will not attempt to allocate a bunch of memory. The decompression memory usage is now limited to a 24 meg block, which iOS can handle.

Double check your results by printing out the header info of the compressed file:

$ 7za l -slt Animations_9_24m_NOTSOLID.7z

Path = Animations_9_24m_NOTSOLID.7z
Type = 7z
Method = LZMA
Solid = +
Blocks = 7
Physical Size = 9515686
Headers Size = 1714

Note the "Blocks" element in the above output, it indicates that data has been segmented into different 24 meg blocks.

If you compare the segmented file info above to the output without the -ms=24m argument, you would see:

$ 7za l -slt Animations_9_24m.7z

Path = Animations_9_24m.7z
Type = 7z
Method = LZMA
Solid = +
Blocks = 1
Physical Size = 8743171
Headers Size = 1683

Note the "Blocks" value, you don't want just 1 huge block since that will attempt to allocate a huge amount of memory when decompressing on iOS.

MoDJ
  • 4,309
  • 2
  • 30
  • 65
  • I am going to keep playing around with this, but no matter what I seem to try including the exact parameters you mentioned, I cannot get 7za to produce a .7z file with more than one block. Is this possibly because I am compressing only a single file? (you mentioned in your scenario you have multiple files) – tradergordo Sep 30 '12 at 20:18
  • I'm fairly sure your workaround will only work if you have multiple files each of which is smaller than whatever amount of RAM you want to max out at. For example if I have 2 500 meg files, it is going to create 2 blocks and will crash any iphone/ipad. A slightly crazier workaround would be to use a binary splitter app on your file, say you had one 250MB file, binary split this into 10 equal pieces, now you will get 10 blocks and the uncompress will work on iOS, but I have no idea how to combine the files back into one file in objective-c. If anyone has further ideas, please let me know. – tradergordo Sep 30 '12 at 22:39
  • So, I sat down and solved the real problem in the SDK, and now it uses mmap() to enable extraction of files up to about 650 megs without crashing out due to memory on iOS. The code is available in this git repo: https://github.com/mdejong/lzmaSDK – MoDJ Jan 22 '13 at 01:57
  • Sounds good. So what amount of RAM is used during decompression with your solution if for example you were dealing with a 500MB file?  Why is there a 650MB limit? I've been using the binary file split/combine method which works fine but is not the most elegant work around. – tradergordo Jan 22 '13 at 12:19
  • iOS imposes a limit of about 690 megs of total virtual memory that can be mapped at any one time on the device. If you try to map 700 megs, the mmap call will fail. Using this new mmap logic means that if you set the upper limit of the block size to 650, then it will be possible to decode single files as large as 650 megs. Try out the example project at github listed above, it shows how a very large archive can be created, one of the examples show two 650 meg files that can be decoded one at a time. You cannot decode a 1 gig file, but 650 megs per file is huge. Plus compression is improved. – MoDJ Jan 23 '13 at 23:04
0

I've run into the same problem, but found a much more practical workaround:

  • use the CPP interface of LZMA SDK. It uses only very little memory and does not suffer from the memory consumption problem as the C interface does (as tradergordo already correctly said as well).

  • have a look at LZMAAlone.cpp, strip it off anything unneccessary (like encoding, 7-zip file format stuff, and btw. encoding will also still require big memory) and create a tiny header file for your CPP LZMA decompressor, e.g.:

extern "C" int extractLZMAFile(const char *filePath, const char *outPath);

  • for very large files (like 100MB+ db files) I then use LZMA decompression to compress this file. Of course, since LZMA alone does not have any file container, you need to give the name of the decompressed file

  • because I don't have full 7Z support, I use tar as container together with lzma compressed files. There is a tiny iOS untar at https://github.com/mhausherr/Light-Untar-for-iOS

Unfortunately I can't provide any sources, even though I'd like to.

benjist
  • 2,740
  • 3
  • 31
  • 58