1

I need to create 100mb zipped file within 5 seconds which contains a CSV file using java. I have created test.zip which contains the CSV file but it is taking too much time (~30 seconds) to generate the zip file. Here is the code that I've written so far:

ByteArrayOutputStream baos = new ByteArrayOutputStream();
/* Create instance of ZipOutputStream to create ZIP file. */
ZipOutputStream zipOutputStream = new ZipOutputStream(baos);

/* Create ZIP entry for file.The file which is created put into the
 * zip file.File is not on the disk, csvFileName indicates only the
 * file name to be put into the zip
 */
ZipEntry zipEntry = new ZipEntry("Test.zip");

zipOutputStream.putNextEntry(zipEntry);

/* Create OutputStreamWriter for CSV. There is no need for staging
 * the CSV on filesystem . Directly write bytes to the output stream.
 */
BufferedWriter bufferedWriter = new BufferedWriter(new OutputStreamWriter(zipOutputStream, "UTF-8"));

CsvListWriter csvListWriter = new CsvListWriter(bufferedWriter, CsvPreference.EXCEL_PREFERENCE);

/* Write the CSV header to the generated CSV file. */
csvListWriter.writeHeader(CSVGeneratorConstant.CSV_HEADERS);

/* Logic to Write the content to CSV */
long startTime = System.currentTimeMillis();

for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
    final List<String> rowContent = new LinkedList<String>();
    for (int colIdx = 0; colIdx < 6; colIdx++) {
        String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
        rowContent.add(str);
    }
    csvListWriter.write(rowContent);
}
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
System.out.println("time==" + elapsedTime / 1000f + "Seconds");

System.out.println("Size=====" + baos.size() / (Math.pow(1024, 2)) + "MB");

csvListWriter.close();
bufferedWriter.close();
zipOutputStream.close();
baos.close();

I am using the super csv library, but I have also tried to create zip file in memory without super csv lib without success. Can you please help me?

Craig
  • 2,286
  • 3
  • 24
  • 37
  • Are you sure you machine can do this? Cna you try the same thing from the command line. BTW `mb` = `milli-bits`, `MB` = `Mega-Bytes` – Peter Lawrey Sep 15 '15 at 07:24
  • 2
    Instead of building up a list of strings, why not write directly to the ZipOutputStream? This will save you quite a bit of time. – Peter Lawrey Sep 15 '15 at 07:26
  • 3
    When you CPU profile this, what do you see and taking the most time? – Peter Lawrey Sep 15 '15 at 07:26
  • Side note: Use try-with-resources to close your streams. – Puce Sep 15 '15 at 08:12
  • Instead of building up a `ByteArrayOutputStream,` why not write directly to the target file? You're just wasting time and space. – user207421 Sep 15 '15 at 08:19
  • @PeterLawrey- 1)100 MB(Mega Bytes), 2)I have also tried to write directly to the ZipOutputStream but its takes more time because of writing every str bytes.like ZipOutputStream.write(str.getBytes()); 3)Ofcourse time taken to write csvListWriter.write(rowContent) – user3771343 Sep 15 '15 at 10:30
  • @EJP-I need this zip file in memory because if i am going to write in destination file then it will take more time. – user3771343 Sep 15 '15 at 10:32

1 Answers1

3

Your test data is about 1GB, which compresses down to 100MB. Depending on your hardware, it may not be possible to achieve < 5s performance.

I've put together a quick and dirty benchmark which highlights the performance impacts of writing to a zip file.

  • Write to CSV with String.join(): 9.6s
  • Write to CSV with Super CSV: 12.7s
  • Write to CSV within zip with String.join(): 18.6s
  • Write to CSV within zip with Super CSV: 22.5s

It appears that there's a little bit of an overhead with using Super CSV (~122%), but just writing to a zip file almost doubles (~190%) the amount of time, regardless of whether Super CSV is used.

Here's the code for the 4 scenarios.

Unlike your provided code, I'm writing directly to a file (I didn't notice any difference between writing to disk vs writing to memory, i.e. ByteArrayOutputStream). I've also skipped the BufferedWriter on the Super CSV examples, because it already uses that internally, and I've used try-with-resources to make things cleaner.

@Test
public void testWriteToCsvFileWithSuperCSV() throws Exception {
    long startTime = System.currentTimeMillis();

    try (FileOutputStream csvFile = new FileOutputStream(new File("supercsv.csv"));
         ICsvListWriter writer = new CsvListWriter(new OutputStreamWriter(csvFile, "UTF-8"), CsvPreference.EXCEL_PREFERENCE)
    ){
        for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
            final List<String> rowContent = new LinkedList<>();
            for (int colIdx = 0; colIdx < 6; colIdx++) {
                String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
                rowContent.add(str);
            }
            writer.write(rowContent);
        }
    }

    long stopTime = System.currentTimeMillis();
    long elapsedTime = stopTime - startTime;
    System.out.println("Writing to CSV with Super CSV took " + (elapsedTime / 1000f) + " seconds");
}

@Test
public void testWriteToCsvFileWithinZipWithSuperCSV() throws Exception {
    long startTime = System.currentTimeMillis();

    try (FileOutputStream zipFile = new FileOutputStream(new File("supercsv.zip"));
         ZipOutputStream zos = new ZipOutputStream(zipFile);
         ICsvListWriter writer = new CsvListWriter(new OutputStreamWriter(zos, "UTF-8"), CsvPreference.EXCEL_PREFERENCE)
    ){

        ZipEntry csvFile = new ZipEntry("supercsvwithinzip.csv");
        zos.putNextEntry(csvFile);

        for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
            final List<String> rowContent = new LinkedList<>();
            for (int colIdx = 0; colIdx < 6; colIdx++) {
                String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
                rowContent.add(str);
            }
            writer.write(rowContent);
        }
    }

    long stopTime = System.currentTimeMillis();
    long elapsedTime = stopTime - startTime;
    System.out.println("Writing to CSV within zip file with Super CSV took " + (elapsedTime / 1000f) + " seconds");
}

@Test
public void testWriteToCsvFileWithStringJoin() throws Exception {
    long startTime = System.currentTimeMillis();

    try (FileOutputStream textFile = new FileOutputStream(new File("join.csv"));
         BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(textFile, "UTF-8"));
    ){

        for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
            final List<String> rowContent = new LinkedList<>();
            for (int colIdx = 0; colIdx < 6; colIdx++) {
                String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
                rowContent.add(str);
            }
            writer.append(String.join(",", rowContent) + "\n");
        }
    }

    long stopTime = System.currentTimeMillis();
    long elapsedTime = stopTime - startTime;
    System.out.println("Writing to CSV with String.join() took " + (elapsedTime / 1000f) + " seconds");
}

@Test
public void testWriteToCsvFileWithinZipWithStringJoin() throws Exception {
    long startTime = System.currentTimeMillis();

    try (FileOutputStream zipFile = new FileOutputStream(new File("join.zip"));
         ZipOutputStream zos = new ZipOutputStream(zipFile);
         BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(zos, "UTF-8"));
    ){

        ZipEntry csvFile = new ZipEntry("joinwithinzip.csv");
        zos.putNextEntry(csvFile);

        for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
            final List<String> rowContent = new LinkedList<>();
            for (int colIdx = 0; colIdx < 6; colIdx++) {
                String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
                rowContent.add(str);
            }
            writer.append(String.join(",", rowContent) + "\n");
        }
    }

    long stopTime = System.currentTimeMillis();
    long elapsedTime = stopTime - startTime;
    System.out.println("Writing to CSV within zip with String.join() took " + (elapsedTime / 1000f) + " seconds");
}
James Bassett
  • 9,458
  • 4
  • 35
  • 68