0

I want to split a CSV file which is having comma and other special characters in its data using java. I tried regex way of splitting like line.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)", -1); and more similar kind of things. But splitting is wrong in some rows.

CSV is having around 3000 rows. Some of them are not properly getting split. Please suggest a standard way to split the data in csv file.

Abhishek kumar
  • 4,347
  • 8
  • 29
  • 44
Gayathri
  • 41
  • 1
  • 8
  • There are already a large number of CSV parsing libraries out there, any one of which you can use. – Joe C Feb 12 '18 at 06:20
  • how should your regular expression know if the comma is a separator or not? – tomas Feb 12 '18 at 06:22
  • I've had success with Commons CSV. User guide here: https://commons.apache.org/proper/commons-csv/user-guide.html – hoipolloi Feb 12 '18 at 06:32
  • Possible duplicate of [CSV API for Java](https://stackoverflow.com/questions/101100/csv-api-for-java) – TobiSH Feb 12 '18 at 06:46

3 Answers3

1

If you have standard desktop or web application Apache-CSV or OpenCSVmight help you. If you are dealing with some kind of "Big Data" technologies have a look at Spark.

TobiSH
  • 2,833
  • 3
  • 23
  • 33
  • 2
    3000 rows != bigData – tomas Feb 12 '18 at 06:22
  • @thomas Agree. But nobody tells you if this is one of a million csv files or if a line contains 100MB of data. According to the question this sounds unlikely but I wanted to point out that the pure amount of rows doesn't tell you anything. – TobiSH Feb 12 '18 at 06:45
0

Instead of separating values using a comma, you can use tab(\t). File can be saved with .csv extension. It has worked for me.

poojabh
  • 415
  • 1
  • 4
  • 9
0

Replace all special character to + and then split

String result = str.replaceAll("[^\\dA-Za-z ]", "").replaceAll("\\s+", "+");
Abhishek kumar
  • 4,347
  • 8
  • 29
  • 44