0

I am using au.com.bytecode.opencsv.CSVReader to read A csv file and print all the records one by one. The code is behaving strange. It's printing a group of lines together as a single line. Then again it's printing next set of lines correctly.

Link to the CSV File

Please download the CSV file from the link above. My code is considering the first line as - from first non-header line to the line just above the line which has below content:

12/4/13: Changed AO to Chief Financial officer.","07/18/2016",

Also, my first data line contains \" in one of the fields. You can do Cntrl +F with \" to find it. If I remove \ from the field , it works fine. Now my question is what logic CSVReader is using to end the first line as specified above? Why is it taking the end of line just before the line which has below content:

12/4/13: Changed AO to Chief Financial officer.","07/18/2016",

It's taking a new line from '12/4/13.........' . Also, the individual lines below that are being taken as separate lines perfectly .

Code for your reference :

csvReader reader = new CSVReader(new FileReader(fileNameWithLocation), ',', '"', 1);

 ColumnPositionMappingStrategy<DomainObj> mappingStrategy = 
                            new ColumnPositionMappingStrategy<DomainObj>();

         mappingStrategy.setType(DomainObj.class);      

          String[] nextLine;

            while ((nextLine = reader.readNext()) != null) 
            {
                    if (nextLine != null) 
                    log.debug("Next line : " + Arrays.toString(nextLine));
            }
Satej Koli
  • 79
  • 2
  • 15
  • The CSV file appears to be invalid. How was it produced? And don't post links here. Post the relevant part of the data. – user207421 Sep 18 '16 at 18:07
  • I think posting the content of the file will make the post look very big unnecessarily. Also, the file is valid. I have verified it. – Satej Koli Sep 19 '16 at 07:13
  • I didn't ask for the content of the file. I add for the relevant part of the data. That's only one line, not an entire file. – user207421 Sep 19 '16 at 16:39

2 Answers2

0

As also posted in the opencsv support request.

The reason is that it reads multiple lines is that we need to allow for data that does have new lines in the fields. So in quoted data when you reach the end of the line and it has not closed the field (no close quotation mark) opencsv will read the next line and keep filling in that line of data. You can see that is the case in your file by looking at the line above the one you listed - put together you will see it really does make one row of data.

,,"440063","DSH440063B","39066","DSH","True","01/01/2014","10/01/2016","12",,,"JOHNSON CITY MEDICAL CENTER","Regional Cancer Center @ Johnson City Medical Center","2205 Pavilion Drive","Suite 101","Kingsport","TN","37660","4641",,,,,,,,,,,,,,,,,,"Shane E. Hilton","Chief Financial Officer","4234311038",,"Trish Tanner","Corp. Director, Consumer Health Svcs","4233023532",,"TRISH TANNER","SYSTEM SERVICES DIRECTOR, PHARMACY SERVICES","10/10/2013","4233023532",,,,,,,,,,,,,,"08/07/2015","False",,"12/3/13 I'm not sure that AO/SBO is at high enough level, pls chk 12/4/13: Changed AO to Chief Financial officer.","07/18/2016",

Notice that the line above ended with pls chk but no closequote so opencsv will read the next line and append the first part of the data to the next.

Quotes that are part of the data must be escaped - hence the \".

Hope that helps.

Scott Conway :)

Scott Conway
  • 975
  • 7
  • 13
  • I understood that because of the \" it will keep on adding the lines, but this should happen until it finds a closing \" . As the file does not have any closing \" , it should have gone till the end of the file to make it as a single line. If you could elaborate why it ended the line at 'pls chk' only , it will be great (Also , I haven't fully understood last part of the answer :) ) Also, what should be the ideal code changes to fix this without changing the CSV file? Is below code fine? CSVReader reader = new CSVReader(new FileReader(fileNameWithLocation), ',', '"', '\0',1); – Satej Koli Sep 19 '16 at 07:13
-1

The backslash escapes the quote, so that the quote character is considered content and not a delimiter. The reader puts " into its buffer and keeps on reading until it hits the next quotation mark.

chrylis -cautiouslyoptimistic-
  • 75,269
  • 21
  • 115
  • 152
  • Thank you @chrylis. I agree to your statement. But , there is no matching \" after the first line in my CSV. It should have gone till the end of the file to find it. But surprisingly the line is getting ended after some lines. I want to know why the first line is ending just before the line which has below contents: 12/4/13: Changed AO to Chief Financial officer.","07/18/2016", If you could download my CSV and have a look, I will be grateful to you :) – Satej Koli Sep 18 '16 at 09:14
  • This answer doesn't make sense. If the quote is part of the content and not a delimiter, why would it read until it hits a second quote? – user207421 Sep 18 '16 at 09:33
  • @EJP The content in question was not included. The most likely explanation is that the backslash is stray. – chrylis -cautiouslyoptimistic- Sep 18 '16 at 09:35
  • @EJP , can you please check my CSV file and see what could be the cause? – Satej Koli Sep 18 '16 at 10:03
  • @chrylis I'm not talking about the content in question. I'm talking about your answer. It contains an obvious self-contradiction. If the quote is part of the content it doesn't need to be matched. It only needs to be matched if it *isn't* part of the content. – user207421 Sep 18 '16 at 18:09