Locating a file by path/name in a zip using libarchive

Question

I'm using libarchive in c/c++ to create a zip archive of files and I'm trying to find if there is a good way to find if a file name (or rather file in a path) already exists in a file.

Currently, my only way is to cycle through all the headers and compare the filenames to the one I am looking to put into the zip, based on the example code from the libarchive website:

  struct mydata *mydata;
  struct archive *a;
  struct archive_entry *entry;
  mydata = malloc(sizeof(struct mydata));
  a = archive_read_new();
  mydata->name = name;
  mydata->fd = open(mydata->name, O_RDONLY); // Include O_BINARY on Windows
  archive_read_support_compression_all(a);
  archive_read_support_format_all(a);
  archive_read_open(a, mydata, NULL, myread, myclose);
  while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
     printf("%s\n",archive_entry_pathname(entry));
  }
  archive_read_finish(a);
  free(mydata);

The printf function has been edited in my code to a comparison. Obviously this has quite a significant overhead as the zip file gets bigger and there are a large number of headers to check.

Is this the best way or am I missing something simpler?

Maintain a list which will store all headers. So query `archive_read_next_header` in a while loop once, store it in the list. Next when when adding a file, check if it is already in the list, if not, then add it to the list and add to the archive. — kiner_shah, Nov 02 '21 at 11:53
Also, please tag your question based on what language you are using either C or C++. — kiner_shah, Nov 02 '21 at 11:54
As I'd be adding files to the archive quite a lot and starting with an empty archive, I could add each file name to a list and check against it each time a new file will be added and add it to the list if it doesn't exist. — yetanothercoder, Nov 02 '21 at 12:57

score 1 · Accepted Answer · answered Nov 02 '21 at 12:02

1

As libarchive's README suggests, the library is intended for handling streaming archives, rather than randomly-accessed ones. It therefore stands to reason that, in order to locate a file in the archive, you have to "roll the tape", so to speak, until you reach it.

You could cache its contents in memory, like @kiner_shah suggests, by making a single pass over the archive with a while (archive_read_next_header(my_archive, &entry) == ARCHIVE_OK) { ... } loop; then you'll have whatever data structure you like for the different directories and files.

answered Nov 02 '21 at 12:02

einpoklum

118,144
57
340
684

Thanks for that answer, means I'm probably not missing an easier way. Are there any better libraries for dealing with zip files (Windows Mingw based compiler) that I should look into? This is in case the keeping a list in memory becomes difficult. – yetanothercoder Nov 02 '21 at 13:02
@yetanothercoder: Ask this extra question on https://softwarerecs.stackexchange.com/ (and post a link here so I can find that question). Also consider accepting my answer. – einpoklum Nov 02 '21 at 13:22
@yetanothercoder [libzip](https://libzip.org/documentation/) is a C library which may help you with your needs. I see a [library function](https://libzip.org/documentation/zip_name_locate.html) which may be useful. But if you wanna avoid switching to a different library, learning how to use it, etc. I think you can just use the approach I suggested. – kiner_shah Nov 02 '21 at 13:25

Locating a file by path/name in a zip using libarchive

1 Answers1