0

I am stuck with the following problem: We are writing a library app which contains a bunch of internal documents (roughly 900). The total size of all docs is around 2,5+ GB. I have to find a way how to initialise the app with 2,5GB of Docs on first start. My idea was to create a zip file of the initial load, download it from server and unpack it during setup on first start. However I am not finding a way how to unzip such a big file, as all solutions read the zip to memory first (completely) before writing it to storage. I also do not want to make 900+ calls to our web-server to download the Documents on first start.

Deployment target is iOS and Android, possibly win+mac later on.

Any ideas?

  • easy way: use `TarFile` (cons: big uncompressed file), hard way: use `ZipFile` (cons: you have to write a custom, file based `InputStream` - i dont know if it is even possible) - all of them are in std [archive_io](https://api.flutter.dev/flutter/archive_io/archive_io-library.html) package – pskink Aug 11 '21 at 10:48
  • Thanks for your answer! I thought the archive package would be my solution, but when unpacking an archive it will always read the whole contents to memory to build an Archive Object. Only then it will start to write the Archive from memory to disk. So I would end up at some point with 2,5GB Archive Object in Memory. **From extract_archive_to_disk.dart** `Archive archive; if (archivePath.endsWith('tar')) { final input = InputFileStream(archivePath); archive = TarDecoder().decodeBuffer(input);` – LrdHelmchen Aug 12 '21 at 10:17
  • with tar file you can do: `final input = InputFileStream('backup.tar'); while(!input.isEOS) { final tf = TarFile.read(input); if (tf.filename.isEmpty) break; print("[${tf.filename}] content.length: ${tf.content.length}"); }` but tars are not compressed so it is much better to deal with zips – pskink Aug 12 '21 at 11:47

1 Answers1

0

i tested it on flutter linux where Uri.base points to the root of the project (the folder with pubspec.yaml, README.md etc) - if you run it on android / iOS check where Uri.base points to and change baseUri if it is not good location:

final debug = true;
final Set<Uri> foldersCreated = {};
final baseUri = Uri.base;
final baseUriLength = baseUri.toString().length;
final zipFileUri = baseUri.resolve('inputFolder/backup.zip');

final outputFolderUri = baseUri.resolve('outputFolder/');
print('0. files will be stored in [$outputFolderUri]');
final list = FileList(zipFileUri, debug: debug);
print('1. reading ZipDirectory...');
final directory = ZipDirectory.read(InputStream(list));
print('2. iterating over ZipDirectory file headers...');
for (final zfh in directory.fileHeaders) {
  final zf = zfh.file;
  final content = zf.content;

  // writing file
  final uri = outputFolderUri.resolve(zf.filename);
  final folderUri = uri.resolve('.');
  if (foldersCreated.add(folderUri)) {
    if (debug) print(' #### creating folder [${folderUri.toString().substring(baseUriLength)}] #### ');
    Directory.fromUri(folderUri).createSync(recursive: true);
  }
  File.fromUri(uri).writeAsBytesSync(content);

  print("file: [${zf.filename}], compressed: ${zf.compressedSize}, uncompressed: ${zf.uncompressedSize}, length: ${content.length}");
}
list.close();
print('3. all done!');

and here is a List backed by a LruMap that reads data in chunks from your huge zip file:

class FileList with ListMixin<int> {
  RandomAccessFile _file;
  LruMap<int, List<int>> _cache;
  final int maximumPages;
  final int pageSize;
  final bool debug;

  FileList(Uri uri, {
    this.pageSize = 1024, // 1024 is just for tests: make it bigger (1024 * 1024 for example) for normal use
    this.maximumPages = 4, // maybe even 2 is good enough?
    this.debug = false,
  }) {
    _file = File.fromUri(uri).openSync();
    length = _file.lengthSync();
    _cache = LruMap(maximumSize: maximumPages);
  }

  void close() => _file.closeSync();

  @override
  int length;

  int minIndex = -1;
  int maxIndex = -1;
  List<int> page;
  @override
  int operator [](int index) {
    // print(index);

    // 1st cache level
    if (index >= minIndex && index < maxIndex) {
      return page[index - minIndex];
    }

    // 2nd cache level
    int key = index ~/ pageSize;
    final pagePosition = key * pageSize;
    page = _cache.putIfAbsent(key, () {
      if (debug) print(' #### reading page #$key (position $pagePosition) #### ');
      _file.setPositionSync(pagePosition);
      return _file.readSync(pageSize);
    });
    minIndex = pagePosition;
    maxIndex = pagePosition + pageSize;
    return page[index - pagePosition];
  }

  @override
  void operator []=(int index, int value) => null;
}

you can play with pageSize and maximumPages to find the optimal solution - i think you can start with pageSize: 1024 * 1024 and maximumPages: 4 but you have to check it by yourself

of course all of that code should be run in some Isolate since it takes a lot of time to unzip couple of GB and then your UI will freeze, but first run it as it is and see the logs

EDIT

it seems that ZipFile.content has some memory leaks so the alternative could be a "tar file" based solution, it uses tar package and since it reads a Stream as an input you can use compressed *.tar.gz files (your Documents.tar had 17408 bytes while Documents.tar.gz has 993 bytes), notice that you can even read your data directly from the socket's stream so no need for any intermediate .tar.gz file:

final baseUri = Uri.base;
final tarFileUri = baseUri.resolve('inputFolder/Documents.tar.gz');

final outputFolderUri = baseUri.resolve('outputFolder/');
print('0. files will be stored in [$outputFolderUri]');
final stream = File.fromUri(tarFileUri)
  .openRead()
  .transform(gzip.decoder);
final reader = TarReader(stream);
print('1. iterating over tar stream...');

while (await reader.moveNext()) {
  final entry = reader.current;
  if (entry.type == TypeFlag.dir) {
    print("dir: [${entry.name}]");
    final folderUri = outputFolderUri.resolve(entry.name);
    await Directory.fromUri(folderUri).create(recursive: true);
  }
  if (entry.type == TypeFlag.reg) {
    print("file: [${entry.name}], size: ${entry.size}");
    final uri = outputFolderUri.resolve(entry.name);
    await entry.contents.pipe(File.fromUri(uri).openWrite());
  }
}
print('2. all done!');
pskink
  • 23,874
  • 6
  • 66
  • 77
  • Thanks that seems to mostly work. However I am seeing that the memory footprint of the app is constantly increasing. I just tried on my iPhone and am running out of memory. I did minor adjustments for null-safety, but I dont think they are creating a leak. – LrdHelmchen Aug 12 '21 at 12:07
  • I am down to `pageSize` 512 and `maximumPages` 1 – LrdHelmchen Aug 12 '21 at 12:14
  • how many files do you have in the zip and what memory increase do you see? is it more or less a sum of the all uncompressed files? – pskink Aug 12 '21 at 12:16
  • Roughly 1000 files with extracted Size of 2,6GB. Memory reached more than 3,5GB on macOS when I stopped it – LrdHelmchen Aug 12 '21 at 12:21
  • I feel like the settings dont really matter: I am currently trying this: `this.pageSize = 1048576, this.maximumPages = 4` Now I am topping out around at 3GB memory – LrdHelmchen Aug 12 '21 at 12:30
  • ok , what if you comment out lines after `// writing file` (except last `print("file: ...`)? – pskink Aug 12 '21 at 12:36
  • Same behaviour. Seems to be the for loop – LrdHelmchen Aug 12 '21 at 12:39
  • so i think that `final content = zf.content;` is a culprit, try to comment it out - if it is still the same then `ZipDirectory` keeps the memory – pskink Aug 12 '21 at 12:41
  • yep `final content = zf.content` is the line that leaks the memory – LrdHelmchen Aug 12 '21 at 12:51
  • Just tried. That doesn't give any problems with memory, but it seems to produce garbage probably due to charset issues. My files have some german characters like ä,ü,ö,ß and &. I am looking how to fix this – LrdHelmchen Aug 12 '21 at 14:02
  • check the update in my answer: i tried your "umlauts" and it works fine – pskink Aug 12 '21 at 14:14
  • It doesnt work for me as soon as special Chars are involved: See this tar file as an example: https://drive.google.com/file/d/1r5Nwx10VMIW03Q70vDvIOKu-4fDffi7n/view?usp=sharing – LrdHelmchen Aug 12 '21 at 14:41
  • Awesome @pskink That's the solution. I am so thankful for your help :) It works excellent! – LrdHelmchen Aug 13 '21 at 07:23
  • your welcome, btw: i tried reading 16MB tar.gz directly from the http socket and it works just fine: try that way – pskink Aug 13 '21 at 07:38
  • Mh thats not working for me as the file is autoUncompressed during download... How is your implementation of this? – LrdHelmchen Aug 13 '21 at 08:17
  • Nevermind, I was too stupid to just skip the step with the GZIP Decoder. – LrdHelmchen Aug 13 '21 at 08:20
  • i tried to use it as async generator, so you can use it with `StreamBuilder`: https://paste.ubuntu.com/p/8JK8VnYqPH/, btw you need `import 'package:http/http.dart' as http;` – pskink Aug 13 '21 at 08:37
  • Very cool :) Thats exactly how i did it :) Many thanks again. You saved my day! – LrdHelmchen Aug 14 '21 at 15:28
  • your welcome, out of curiosity, do you also show the current progress? if so how do you pass total number of files or total size, whatever? – pskink Aug 14 '21 at 15:35
  • The process of creating the archive will be automatic and periodic. I was thinking about providing the amount of files inside the archive with a json. The json most probably could contain information about the last time the archive was created, the unpacked size (?), and the amount of files inside the archive. – LrdHelmchen Aug 16 '21 at 07:09