0

While I observed that usually the files inside a folder are listed sequentially in a tar.gz archive in one exceptional case I found that it is listed in a random manner. E.g., let's say there are three folders a, b, and c and each contains 1,2,3 file. In the usual case, the archive entries would be listed in a/1, a/2, a/3, b/1, b/2, b/3, c/1, c/2, c/3 but in this case it is something like b/2, a/1, b/4, ... Why this could happen? I'm using the first organization assumption to read a .tar.gz archive file and do some processing on the data inside at a folder level. Without traversing the whole archive each time and generating parent/child formation any idea if I could get the folder listings sorted inline for such cases. Sample code below:

   TarArchiveInputStream tis = new TarArchiveInputStream("a.tar");
   while(tis.getNextTarEntry()!=null)
    System.out.println(tis.getCurrentEntry().getName() );

I could not find any API which would give me such a sorted list inline. It would be very helpful if somebody helps me here. I'm stuck with this case.

Arunavo
  • 34
  • 2
  • 8
  • 1
    A tar archive is a sequence of entries in whatever order the creator used to store them. You can’t sort a stream of elements without reading all of them. – Holger Dec 21 '20 at 13:53
  • can you guide me in that? I'm not yet good with streams. I hope you mean Java Streams. – Arunavo Dec 21 '20 at 18:31
  • Also, I have a concern with the execution time (algorithmic time-complexity, I think that would be optimized and IO reads) as I would be reading millions of files inside the tar distributed amongst the folders. – Arunavo Dec 21 '20 at 18:39
  • 1
    I was using the term “stream” in the broadest sense. You want to avoid having everything in memory, but you need all names in memory to sort them. In the worst case, the last entry has the name that would come first in your desired order. There is no (general) solution at all. When you have a source that can be read twice, like a local file, you can make a first pass to read the names only, sort them, followed by a second pass to process the content. But in the end, everything depends on what you are going to do with the entries. Some tools extract everything into a temporary directory first… – Holger Dec 22 '20 at 09:39

0 Answers0