1

I'm using libarchive in c/c++ to create a zip archive of files and I'm trying to find if there is a good way to find if a file name (or rather file in a path) already exists in a file.

Currently, my only way is to cycle through all the headers and compare the filenames to the one I am looking to put into the zip, based on the example code from the libarchive website:

  struct mydata *mydata;
  struct archive *a;
  struct archive_entry *entry;
  mydata = malloc(sizeof(struct mydata));
  a = archive_read_new();
  mydata->name = name;
  mydata->fd = open(mydata->name, O_RDONLY); // Include O_BINARY on Windows
  archive_read_support_compression_all(a);
  archive_read_support_format_all(a);
  archive_read_open(a, mydata, NULL, myread, myclose);
  while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
     printf("%s\n",archive_entry_pathname(entry));
  }
  archive_read_finish(a);
  free(mydata);

The printf function has been edited in my code to a comparison. Obviously this has quite a significant overhead as the zip file gets bigger and there are a large number of headers to check.

Is this the best way or am I missing something simpler?

  • Maintain a list which will store all headers. So query `archive_read_next_header` in a while loop once, store it in the list. Next when when adding a file, check if it is already in the list, if not, then add it to the list and add to the archive. – kiner_shah Nov 02 '21 at 11:53
  • Also, please tag your question based on what language you are using either C or C++. – kiner_shah Nov 02 '21 at 11:54
  • 1
    As I'd be adding files to the archive quite a lot and starting with an empty archive, I could add each file name to a list and check against it each time a new file will be added and add it to the list if it doesn't exist. – yetanothercoder Nov 02 '21 at 12:57

1 Answers1

1

As libarchive's README suggests, the library is intended for handling streaming archives, rather than randomly-accessed ones. It therefore stands to reason that, in order to locate a file in the archive, you have to "roll the tape", so to speak, until you reach it.

You could cache its contents in memory, like @kiner_shah suggests, by making a single pass over the archive with a while (archive_read_next_header(my_archive, &entry) == ARCHIVE_OK) { ... } loop; then you'll have whatever data structure you like for the different directories and files.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • Thanks for that answer, means I'm probably not missing an easier way. Are there any better libraries for dealing with zip files (Windows Mingw based compiler) that I should look into? This is in case the keeping a list in memory becomes difficult. – yetanothercoder Nov 02 '21 at 13:02
  • @yetanothercoder: Ask this extra question on https://softwarerecs.stackexchange.com/ (and post a link here so I can find that question). Also consider accepting my answer. – einpoklum Nov 02 '21 at 13:22
  • @yetanothercoder [libzip](https://libzip.org/documentation/) is a C library which may help you with your needs. I see a [library function](https://libzip.org/documentation/zip_name_locate.html) which may be useful. But if you wanna avoid switching to a different library, learning how to use it, etc. I think you can just use the approach I suggested. – kiner_shah Nov 02 '21 at 13:25