PROBLEM SOLVED IN EDIT 3
I've been struggling with this problem for sometime. All of the questions here in SO or internet seems to work only on 'shallow' structures with one zip inside of another. However I have zip archive which structure is more or less something like this:
input.zip/
--1.zip/
--folder/
----2.zip/
------3.zip/
--------test/
----------some-other-folder/
----------archive.gz/
------------filte-to-parse
----------file-to-parse3.txt
------file-to-parse.txt
--4.zip/
------folder/
and so on so on, my code needs to handle N-level of zips while preserving original zips, gzips, folders and files structure. Using temporary files is forbidden as of lack of privileges (this is something i'm not willing to change).
This is my code I wrote so far, however ZipOutputStream
seems to operate only on one (top) level - in case of directories with files/dirs named exactly the same it throws Exception in thread "main" java.util.zip.ZipException: duplicate entry: folder/
. It also skips empty directories (which is not expected). What I want to achieve is somehow move my ZipOutputStream
to 'lower' level and do operations on each of zips. Maybe there's better approach to handle all of this problem, any help would be appreciated. I need to perform certain text extraction/modification later, however I'm not starting it yet until reading/writing whole structure is not working properly. Thanks in advance for any help!
//constructor
private final File zipFile;
ArchiveResolver(String fileToHandle) {
this.zipFile = new File(Objects.requireNonNull(getClass().getClassLoader().getResource(fileToHandle)).getFile());
}
void resolveInputFile() throws Exception {
FileInputStream fileInputStream = new FileInputStream(this.zipFile);
FileOutputStream fileOutputStream = new FileOutputStream("out.zip");
ZipOutputStream zipOutputStream = new ZipOutputStream(fileOutputStream);
ZipInputStream zipInputStream = new ZipInputStream(fileInputStream);
zip(zipInputStream, zipOutputStream);
zipInputStream.close();
zipOutputStream.close();
}
// this one doesn't preserve internal structure(empty folders), but can work on each file
private void zip(ZipInputStream zipInputStream, ZipOutputStream zipOutputStream) throws IOException {
ZipEntry entry;
while ((entry = zipInputStream.getNextEntry()) != null) {
System.out.println(entry.getName());
byte[] buffer = new byte[1024];
int length;
if (entry.getName().endsWith(".zip")) {
// wrapping outer zip streams to inner streams making actual entries a new source
ZipInputStream innerZipInputStream = new ZipInputStream(zipInputStream);
ZipOutputStream innerZipOutputStream = new ZipOutputStream(zipOutputStream);
ZipEntry zipEntry = new ZipEntry(entry.getName());
// add new zip entry here to outer zipOutputStream: i.e. data.zip
zipOutputStream.putNextEntry(zipEntry);
// now treat this data.zip as parent and call recursively zipFolder on it
zip(innerZipInputStream, innerZipOutputStream);
// Finish internal stream work when innerZipOutput is done
innerZipOutputStream.finish();
// Close entry
zipOutputStream.closeEntry();
} else if (entry.isDirectory()) {
// putting new zip entry into output stream and adding extra '/' to make
// sure zipOutputStream will treat it as folder
ZipEntry zipEntry = new ZipEntry(entry.getName() + "/");
// this only should preserve internal structure
zipOutputStream.putNextEntry(zipEntry);
// reading everything from zipInputStream
while ((length = zipInputStream.read(buffer)) > 0) {
// sending it straight to zipOutputStream
zipOutputStream.write(buffer, 0, length);
}
zipOutputStream.closeEntry();
// This else will include checking if file is respectively:
// .gz file <- then open it, read from file inside, modify and save it
// .txt file <- also read, modify and preserve
} else {
// create new entry on top of this
ZipEntry zipEntry = new ZipEntry(entry.getName());
zipOutputStream.putNextEntry(zipEntry);
while ((length = zipInputStream.read(buffer)) > 0) {
zipOutputStream.write(buffer, 0, length);
}
zipOutputStream.closeEntry();
}
}
}
// This one preserves internal structure (empty folders and so)
// BUT! no work on each file is possible it just preserves everything as it is
private void zipWhole(ZipInputStream zipInputStream, ZipOutputStream zipOutputStream) throws IOException {
ZipEntry entry;
while ((entry = zipInputStream.getNextEntry()) != null) {
System.out.println(entry.getName());
byte[] buffer = new byte[1024];
int length;
zipOutputStream.putNextEntry(new ZipEntry(entry.getName()));
while ((length = zipInputStream.read(buffer)) > 0) {
zipOutputStream.write(buffer, 0, length);
}
zipOutputStream.closeEntry();
}
}
EDIT:
Updated my code to the newest version, still nothing to be proud of but did some changes however still not working... I've added here two very important comments about (in my opinion) code that fails. So I've tested two approaches - the first one is getting ZipInputStream
from zipFile
by using getInputStream(ZipEntry e);
- throws Exception in thread "main" java.util.zip.ZipException: no current ZIP entry
when I'm trying to put some entries to ZipOutputStream
. The second approach focuses on "wrapping" ZipInputStream
into one another -> this results in empty ZipInputStream
s with no entries and application just goes through the files, list them (only top level of zips...) and finishes without saving anything into the out.zip
file.
EDIT 2:
With a little suggestions from the people in the comments, I've decided to rewrite my code focusing to close, finish and closeEntry in appropriate places (I hope i did it better now). So right now I've achieved a little of something - code iterates through every entry, and saves it into out.zip file with proper zip packaging inside. Still skips empty folders tho, not sure why (I've checked some of the questions on stack and web, seems ok). Anyway thanks for help so far, I'll try to work this out and I'll keep this updated.
EDIT 3:
After few approaches to the problem and some reading + refactoring I've managed to solve this problem (however there's still problem while running this code on Linux - empty directories are skipped, seems to be connected to they way certain OS preserve file information?). Here's working solution:
void resolveInputFile() throws IOException {
FileInputStream fileInputStream = new FileInputStream(this.zipFile);
FileOutputStream fileOutputStream = new FileOutputStream("in.zip");
ZipOutputStream zipOutputStream = new ZipOutputStream(fileOutputStream);
ZipInputStream zipInputStream = new ZipInputStream(fileInputStream);
zip(zipInputStream, zipOutputStream);
zipInputStream.close();
zipOutputStream.close();
}
private void zip(ZipInputStream zipInputStream, ZipOutputStream zipOutputStream) throws IOException {
ZipEntry entry;
while ((entry = zipInputStream.getNextEntry()) != null) {
logger.info(entry.getName());
if (entry.getName().endsWith(".zip")) {
// If entry is zip, I create inner zip streams that wrap outer ones
ZipInputStream innerZipInputStream = new ZipInputStream(zipInputStream);
ZipOutputStream innerZipOutputStream = new ZipOutputStream(zipOutputStream);
ZipEntry zipEntry = new ZipEntry(entry.getName());
zipOutputStream.putNextEntry(zipEntry);
zip(innerZipInputStream, innerZipOutputStream);
//As mentioned in comments, proper streams needs to be properly closed/finished, I'm done writing to inner stream so I call finish() rather than close() which closes outer stream
innerZipOutputStream.finish();
zipOutputStream.closeEntry();
} else if (entry.getName().endsWith(".gz")) {
GZIPInputStream gzipInputStream = new GZIPInputStream(zipInputStream);
//small trap while using GZIP - to save it properly I needed to put new ZipEntry to outerZipOutputStream BEFORE creating GZIPOutputStream wrapper
ZipEntry zipEntry = new ZipEntry(entry.getName());
zipOutputStream.putNextEntry(zipEntry);
GZIPOutputStream gzipOutputStream = new GZIPOutputStream(zipOutputStream);
//To make it as as much efficient as possible I've used BufferedReader
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream));
long start = System.nanoTime();
logger.info("Started to process {}", zipEntry.getName());
String line;
while ((line = bufferedReader.readLine()) != null) {
//PROCESSING LINE BY LINE...
zipOutputStream.write((line + "\n").getBytes());
}
logger.info("Processing of {} took {} miliseconds", entry.getName() ,(System.nanoTime() - start) / 1_000_000);
gzipOutputStream.finish();
zipOutputStream.closeEntry();
} else if (entry.getName().endsWith(".txt")) {
ZipEntry zipEntry = new ZipEntry(entry.getName());
zipOutputStream.putNextEntry(zipEntry);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(zipInputStream));
long start = System.nanoTime();
logger.info("Started to process {}", zipEntry.getName());
String line;
while ((line = bufferedReader.readLine()) != null) {
//PROCESSING LINE BY LINE...
zipOutputStream.write((line + "\n").getBytes());
}
logger.info("Processing of {} took {} miliseconds", entry.getName() ,(System.nanoTime() - start) / 1_000_000);
zipOutputStream.closeEntry();
} else if (entry.isDirectory()) {
//Standard directory preserving
byte[] buffer = new byte[8192];
int length;
// Adding extra "/" to make sure it's dir
ZipEntry zipEntry = new ZipEntry(entry.getName() + "/");
zipOutputStream.putNextEntry(zipEntry);
while ((length = zipInputStream.read(buffer)) > 0) {
// sending it straight to zipOutputStream
zipOutputStream.write(buffer, 0, length);
}
zipOutputStream.closeEntry();
} else {
//In my case it probably will never be called but if there's some different file in here it will be preserved unchanged in the output file
byte[] buffer = new byte[8192];
int length;
ZipEntry zipEntry = new ZipEntry(entry.getName());
zipOutputStream.putNextEntry(zipEntry);
while ((length = zipInputStream.read(buffer)) > 0) {
zipOutputStream.write(buffer, 0, length);
}
zipOutputStream.closeEntry();
}
}
}
Thanks again for all the help and good advices.