0

I need to parse a one-column CSV file that not only has extra commas but also some of the names include extra quotes. I have looked over and have read the other previous questions and one of the best answers was Achintya Jha's Answer. However, that solution does not seem to work in my case. An example is that the name

ADAMS COUNTY SHERIFF "ADAMS COUNTY SHERIFF'S OFFICE, CO"

is being printed out as:

ADAMS COUNTY SHERIFF 
"ADAMS COUNTY SHERIFF'S OFFICE, CO"

It is splitting at the correct spots and is taking care of the extra commas but not it is hitting the extra quotes and is splitting there now too, so String csvSplitBy = ",(?=([^\"]*\"[^\"]*\")*[^\"]*$)"; will not work. Does anyone know of another way to handle this issue in Java? Other have asked this question for an answer in other languages but I could not find any, other then the one i linked to, about Java. Thanks!

This is my Java code:

package csvdatacompareapplication;
import java.io.*;

public class CSVDataCompareApplication {
    public static void main(String[] args) {

        BufferedReader br = null;
        BufferedReader br2 = null;
        String customerListAllCustomers = "C:\\Users\\Desktop\\customerListAllCustomers.csv";
        String customerListToRemove = "C:\\Users\\Desktop\\customerListToRemove.csv";
        String line = "";
        String csvSplitBy = ",";

        try {
            br = new BufferedReader(new FileReader(customerListAllCustomers));
            while ((line = br.readLine()) != null) {
                // use comma as separator
                //String [] customersAll = line.split(csvSplitBy);
                System.out.println(line);
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (br != null) {
                try {
                    br.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }

}

First few lines of my .CSV File

ADAMS COUNTY SHERIFF'S OFFICE, CO
ADAMSON POLICE PRODUCTS
ADAN DAVILA
ADAPT SECURE
ADDISON PD - MIKE VINCENT
ADDISON POLICE - IL
ADDISON PORTER
ADIN MCGARVIE
ADMIRAL FIRE & SAFETY
ADMON IRAMIYA
ADRIAN DANG
ADRIAN HUMPHRIES
ADRIAN KEPKA
ADRIAN SALDANA
ADRIAN SOLER
ADRIAN YORK
ADRIENNE BAKER
ADRIENNE MOOS
ADS INC.
ADS, INC

I updated my java code and now this is what prints out

"ADAMS COUNTY SHERIFF'S OFFICE, CO"
ADAMSON POLICE PRODUCTS
ADAN DAVILA
ADAPT SECURE
ADDISON PD - MIKE VINCENT
ADDISON POLICE - IL
ADDISON PORTER
ADIN MCGARVIE
ADMIRAL FIRE & SAFETY
ADMON IRAMIYA
ADRIAN DANG
ADRIAN HUMPHRIES
ADRIAN KEPKA
ADRIAN SALDANA
ADRIAN SOLER
ADRIAN YORK
ADRIENNE BAKER
ADRIENNE MOOS
ADS INC.
"ADS, INC"

Why did the quotes get placed in?

Community
  • 1
  • 1
Ashton
  • 363
  • 1
  • 4
  • 21
  • The way I would handle this is to iterate over each character. I do not think that `split` would be able to handle all double-quote characteristics. (quoted fields can contain commas, and non-quoted fields can contain quotes mid-way) Also it is possible to have line returns within a double-quoted string, but that may be beyond the scope of your program. – 700 Software May 11 '16 at 17:37
  • 3
    Instead of using regular expression try a library which reads CSV like Apache CSV Parser https://commons.apache.org/proper/commons-csv/user-guide.html – 11thdimension May 11 '16 at 17:38
  • how does your input file look exactly? you sure it is csv? – Tamas Hegedus May 11 '16 at 17:39
  • 1
    A *"1 column CSV file"* that doesn't correctly quote special characters like `"` and `,`, is **not a CSV file**. It is a text file, with one name per line. Just use a `BufferedReader` and the `readLine()` method. – Andreas May 11 '16 at 17:42
  • @Andreas it is in a .CSV format because it is an export from a report I built to display all of our customer names. – Ashton May 11 '16 at 17:43
  • Then the report export feature is flawed, or it is not a CSV file, but maybe a tab-separated text file. With only 1 column, you will of course not see any tab characters. – Andreas May 11 '16 at 17:45
  • @TamasHegedus yes I am sure it is a CSV. My initial problem was that I had extra commas within the fields that was messing me up. So I checked out the solution that I linked to handle the issue , which it did, but it also caused another issue because it is now splitting at the quotes in the fields. I guess my major issue is dealing with extra commas in the .CSV file – Ashton May 11 '16 at 17:46
  • @Andreas I understand why you think that. It is a .CSV file though. I think there is a slight misunderstanding and I aplogize for that. The file is a .CSV file- my issue is that it has extra commas that are throwing off my .split java code. – Ashton May 11 '16 at 17:47
  • 4
    If the file has a line like `abc "def"`, then it is not a *valid* CSV file. – Andreas May 11 '16 at 17:48
  • 1
    Please show us the first few lines of your input file. Open it with a text editor (not with excel), and paste it please so we can see the actual format. – Tamas Hegedus May 11 '16 at 17:49
  • @Andreas Oh now I understand! You are correct. Let me show you lines in my CSV file. Thank you for your patience and for teaching. I wil update my question with my CSV file screenshot. – Ashton May 11 '16 at 17:51
  • 1
    No screenshot please. Just paste in the first few lines. – Andreas May 11 '16 at 17:51
  • The lines are posted of the first few lines in the .CSV File. It has extra commas. I believe it was the answer to the link I posted that added the quotes to my java code. I believe. – Ashton May 11 '16 at 18:00
  • 2
    Do you have to split at all? It looks like the customer name is the only thing on each line. Why not just `while ((customer = br.readLine()) != null) { System.out.println(customer); }` ? I agree with [Andreas](http://stackoverflow.com/questions/37169603/splitting-a-csv-file-in-java-that-has-extra-commas-and-extra-quotes-in-them#comment61874401_37169603) --- _even if the extension is CSV_, you don't need to use a CSV parser if there's only one column of data in the file. – cxw May 11 '16 at 18:05
  • @cxw You are correct. And your while loop worked. I was under the assumption that though it was 1 column of data it still had to be "split" because it was a .CSV file (which it is). When I do `while ((line = br. readLine()) != null) { System.out.println(line); }` It does print out the cells now but for whatever reason it is now printing out with quotes – Ashton May 11 '16 at 18:14
  • 2
    First CSV example is **not** a *valid* CSV file, because it contains unquoted commas. Second CSV example is good, and any CSV parser should now be able to read it for you. See [suggestion](http://stackoverflow.com/questions/37169603/splitting-a-csv-file-in-java-that-has-extra-commas-and-extra-quotes-in-them#comment61874246_37169603) by @11thdimension above. – Andreas May 11 '16 at 18:17
  • @Andreas interesting! That's very odd because it is a direct export from my DBMS. It should save them/convert them to proper .CSV. Although I did not know about the unquoted commas. Thank you for all of your help and teachings! Maybe it is because I tried to export my report as "formatted for exporting". Let me try and export it without hitting that. – Ashton May 11 '16 at 18:22
  • It seems to be saving it as a comma delimted file. Would that be causing the issue? – Ashton May 11 '16 at 18:24
  • 1
    @Ashton if your file doesn't contain any escaped quotes you could try `line.replaceAll("^(")?((?(1)[^"]+|[^,\\n]+))", "$2");` – James Buck May 11 '16 at 18:40
  • @JamesBuck how would that look in regards to my code? That may work! – Ashton May 11 '16 at 18:44
  • @Ashton see what happens when you do `System.out.println(line.replaceAll(...));` – James Buck May 11 '16 at 18:52

1 Answers1

1

Thanks to Andreas and Tamas Hegedus for helping you clarify the question! Try:

        br = new BufferedReader(new FileReader(customerListAllCustomers));
        while ((line = br.readLine()) != null) {
            // one column, so don't need to use comma as separator
            String line2 = line.replaceAll("^\"","").replaceAll("\"$","").replaceAll("\\\"","\"");
            System.out.println(line2);

The replaceAll calls strip leading quotes (^\") and trailing quotes (\"$), and then unescape the remaining quotes (\\\").

Community
  • 1
  • 1
cxw
  • 16,685
  • 2
  • 45
  • 81
  • Wonderful! This got rid of the extra quotes! Now is there a way to save the output to a new.csv file? Thank you all! – Ashton May 11 '16 at 19:42
  • Also, I just realized I may not be able to do this because it is not an ArrayList right? – Ashton May 11 '16 at 19:43
  • 1
    @Ashton, I'm not sure I understand your question regarding `ArrayList`. Just like you did for this question, give it a try, and look for help in existing answers or other sites. If you get stuck, post another question with the code you're having trouble with. :) Good luck! – cxw May 12 '16 at 12:11