0

I am writing csv file with the help of csvWriter (Java) but while executing code on Unix Box with huge records (Around 9000) it creates empty file. When i try to execute same code at local( Eclipse ) at windows it works fine for same huge file. WHY?

I Noticed one thing if record are around 3000 then it works fine at unix box also.

Issue is with only huge file.

I tried to use writer.writeNext() method also instead of writeAll() but still same issue is observed at UNIX Box. :( Note : File does not has any special characters , It's in English.

Code -->

CSVReader reader = new CSVReader(new FileReader(inputFile), ',','"');
List<String[]> csvBody = reader.readAll();
int listSize = csvBody.size();
if(listSize > 0){
String renameFileNamePath = outputFolder + "//"+ existingFileName.replaceFirst("file1", "file2");
File newFile = new File(renameFileNamePath);
CSVWriter writer = new CSVWriter(new FileWriter(newFile), ',');

   for(int row=1 ; row < listSize; row++){
      String timeKeyOrTransactionDate = null;
      timeKeyOrTransactionDate = year+"-"+month+"-"+day+" 00:00:00";
      csvBody.get(row)[0] = timeKeyOrTransactionDate ; 
      }

//Write to CSV file which is open
writer.writeAll(csvBody);
writer.flush();
writer.close();
}
reader.close();
  • `reader.readAll()` with a "_huge_" file? I think not... Given you are literally _copying_ a file, why are you using OpenCSV at all? Furthermore, your resource management is appalling; this is 2016, use `try-with-resources`. – Boris the Spider Oct 06 '16 at 10:14
  • Right now it's too late to change from OpenCSV to any other csv support library. Here I did remote debug till writer.writeAll(csvBody) and noticed that csvBody list contains all 9000 record and code get execute successfully. But when i check the file at UNIX machine it's empty , WHY ? :( – Amit Thakur Oct 06 '16 at 10:25
  • 1
    Why do you need a CSV library at all? You are just copying a file. What's your comment got to do with anything I have said? – Boris the Spider Oct 06 '16 at 10:26
  • It seems issue in my code but couldn't identify it. Because same code is working fine at Windows (at Eclipse) for same file , Issue only observed for Unix Machine. – Amit Thakur Oct 06 '16 at 10:31
  • 1
    You are reading a large file into memory; unless you have set `Xmx` to some large value your application is crashing. Due to your appalling resource management, this crash causes the application to lose the write buffers; result: empty file. – Boris the Spider Oct 06 '16 at 10:33
  • But I am not getting any exception or outOfMemory issue in logs at Unix Box. :( It does not show any error or exception . – Amit Thakur Oct 06 '16 at 10:40
  • Heap Size already 1024m , JAVA_OPTS="$JAVA_OPTS -Dfile.encoding=utf-8" JAVA_OPTS="$JAVA_OPTS -XX:MaxPermSize=1024m" – Amit Thakur Oct 06 '16 at 11:51
  • `MaxPermSize` has absolutely nothing to do with heap size. In fact, in Java 8, `MaxPermSize` does nothing and is ignored. – Boris the Spider Oct 06 '16 at 12:42
  • Xmx already set to 2048 , JAVA_OPTS="-Djava.awt.headless=true -Xmx2048m -XX:+UseConcMarkSweepGC" But still getting same issue. :( – Amit Thakur Oct 06 '16 at 14:05

3 Answers3

2

The readAll and writeAll methods should only be used with small datasets - otherwise avoid it like the plague. Use the readNext and writeNext methods instead so you don't have to read the entire file into memory.

  • Note the readNext will return null once you have no more data (end of Stream or end of file). I will have to update the javadocs to mention that.

  • Disclaimer - I am the maintainer of the opencsv project. So please take the "avoid like plague" seriously. Really that was only put there because most files are usually small and can fit in memory but when in doubt of how big your dataset will be avoid putting it all in memory.

Scott Conway
  • 975
  • 7
  • 13
  • I have already tried writeNext() Method instead of writeAll() but still issue persist and creating empty file at Unix Box for 9000 records. :( – Amit Thakur Oct 06 '16 at 16:13
  • Yes but did you replace readAll with readNext? so have one loop where you readNext then writeNext until readNext returns null. – Scott Conway Oct 06 '16 at 16:17
  • Here in Debug I can see all 9000 records in csvBody List. Then updated code as below but still issue not resolved : Iterator it = csvBody.iterator(); while(it.hasNext()){ String [] obj = (String[]) it.next(); writer.writeNext(obj); } writer.flush(); writer.close(); – Amit Thakur Oct 06 '16 at 16:29
  • Okay a couple of things to check here. First off make sure you are using at least version 3.5 of openCSV as there was a streaming issue in prior versions. In the method create an int counter variable to keep track of what line you have read/processed. Put the entire while loop in a try block and in the finally (don't catch any exceptions) print out the counter and your writer.checkError() which will return true if there was an issue writing the line. From the counter you will know the record that caused the issue. – Scott Conway Oct 06 '16 at 18:11
  • continued from previous comment: find the line(s) with that record and paste that as a comment. You may have corrupt data. If you are using 3.8 then if there is an error you should be able to see the exception because prior to 3.8 the CSVWriter wrapped whatever writer that was passed in into a PrintWriter which fails quietly but we changed that in 3.8 so you should now get an exception when it happens. – Scott Conway Oct 06 '16 at 18:16
0

A data error. The linux machine probably uses UTF-8 Unicode encoding. This can throw error on the first encountered malformed UTF-8 byte sequence, with the single byte Windows encoding simply accepts.

You are using the old utility class FileReader (there also exists the also flawed FileWriter), that use the default platform encoding, which makes the software platform dependent.

You need to do:

Charset charset = Charset.forName("Windows-1252"); // Windows Latin-1

For reading

BufferedReader br = Files.newBufferedReader(inputFile.toPath(), charset);

For writing

Path newFile = Paths.get(renameFileNamePath);
BufferedWriter bw = Files.newBufferedWriter(newFile, charset);
CSVWriter writer = new CSVWriter(bw, ',');

The above assumes a single byte encoding, but probably will work for most other single byte encodings too.

A pity that the file is not in UTF-8, allowing any script.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • I have tried above suggestion but still getting same issue. :( file does not has any special characters , File is in English language. – Amit Thakur Oct 06 '16 at 11:55
  • In [the comments from the OP](https://stackoverflow.com/questions/39893285/csvwriter-behave-differently-on-unix-machine-tomcat-sever-for-huge-file-size#comment67076097_39893285), the `-Dfile.encoding=utf-8` option is set. That should avoid encoding based problems due to different default encodings. – Boris the Spider Oct 06 '16 at 12:44
  • @BoristheSpider, yes did not read that. If that is the option for Windows too, the file evidently is valid UTF-8. – Joop Eggen Oct 06 '16 at 18:42
0

Issue has resolved. Actually output directory was shared via loader application also and loader keeps on checking file in every minutes that's why before writing the csv file ,loader pick it and load with zero kb in DB. Hence I used buffered writer instead of file writer and also writing data first in tmp file then renamed it with file2 and it worked.

Thanks to all of you for your help and valuable suggestions.