0

I have a large csv file as below:

DATE        status       code                       value     value2
2014-12-13  Shipped 105732491-20091002165230    0.000803398 0.702892835
2014-12-14  Shipped 105732491-20091002165231    0.012925206 1.93748834
2014-12-15  Shipped 105732491-20091002165232    0.000191278 0.004772389
2014-12-16  Shipped 105732491-20091002165233    0.007493046 0.44883348
2014-12-17  Shipped 105732491-20091002165234    0.022015049 3.081006137
2014-12-18  Shipped 105732491-20091002165235    0.001894693 0.227268466
2014-12-19  Shipped 105732491-20091002165236    0.000312871 0.003113062
2014-12-20  Shipped 105732491-20091002165237    0.001754068 0.105016053
2014-12-21  Shipped 105732491-20091002165238    0.009773315 0.585910214
:
:

What i need to do is remove the header and change the date format to an integer yyyymmdd (eg. 20141217)

I am using opencsv to read and write the file.

Is there a way where i can change all the dates at once without parsing them one by one? Below is my code to remove the header and create a new file:

void formatCsvFile(String fileToChange) throws Exception {
CSVReader reader = new CSVReader(new FileReader(new File(fileToChange)), CSVParser.DEFAULT_SEPARATOR, CSVParser.NULL_CHARACTER, CSVParser.NULL_CHARACTER, 1)
info "Read all rows at once"
List<String[]> allRows = reader.readAll();

CSVWriter writer = new CSVWriter(new FileWriter(fileToChange), CSVWriter.DEFAULT_SEPARATOR, CSVWriter.NO_QUOTE_CHARACTER)
info "Write all rows at once"
writer.writeAll(allRows)
writer.close()
}

Please can some one help?

Thanks

SSteve
  • 10,550
  • 5
  • 46
  • 72
user175084
  • 4,550
  • 28
  • 114
  • 169

2 Answers2

2

You don't need to parse the dates, but you do need to process each line in the file and convert the data on each line you want to convert. Java/Groovy doesn't have anything like awk where you can work with file data as columns, for example, the first 10 "columns" (characters usually) in every line in a file. Java/Groovy only deals with "rows" of data in a file, not "columns".

You could try something like this: (in Groovy)

reader.eachLine { String theLine ->
    int idx = theLine.indexOf(' ')
    String oldDate = theLine.subString(0, idx)
    String newDate = oldDate.replaceAll('-', '')
    String newLine = newDate + theLine.subString(idx);
    writer.writeLine(newline);
}

Edit: If your CSVReader class is not derived from File, then you can't use Groovy's eachLine method on it. And if the CSVReader class's readAll() method really returns a List of String arrays, then the above code could change to this:

allRows.each { String[] theLine ->
    String newDate = theLine[0].replaceAll('-', '')
    writer.writeLine(newDate + theLine[1..-1])
}
geneSummons
  • 907
  • 5
  • 15
  • Hi.. i am getting brackets in the changed line.. why is that?? Error :null ; Message : No signature of method: au.com.bytecode.opencsv.CSVWriter.writeLine() is applicable for argument types: (java.lang.String) values: [20141108,[Shipped, 105732491-20091002165230, 0.000803398, 0.702892835]] – user175084 Jan 20 '16 at 16:43
  • 1
    The error is saying two things. First you are passing a String array to a method that expects a String arg, and second your String array that represents each changed line in the CSV only has 2 items, the new date in slot 0 and "the rest of the original array" in slot 1. Looks like you need to unwind "the rest of the original array" and write each field to the new line individually. Try `String newline = newDate + theLine.each {return it}` and then pass `newline` to the writer. – geneSummons Jan 20 '16 at 17:07
  • Edit to above after 5 min: Try `String newline = theLine.eachWithIndex { String s, int i -> return i > 0 ? s : newDate}` – geneSummons Jan 20 '16 at 17:15
  • changed the code: allRows.each { String[] theLine -> String newDate = theLine[0].replaceAll('-', '') String newline = theLine.eachWithIndex { String s, int i -> return i > 0 ? s : newDate} writer.writeLine(newline) } Get a blank file.. any help? Thanks – user175084 Feb 01 '16 at 20:45
  • In Groovy generally, the return of a closure is either the last assignment within the closure, or the explicit return value if one was assigned. I'm guessing Groovy's "collection methods" that take a closure argument like "eachWithIndex" have different "return value" semantics. If your "blank file" is "completely empty" corrective action would follow one path, and if your "blank file" is "hundreds of empty lines" corrective action would follow a different path. Basically try: move the "newline" assignment into the "eachWithIndex" closure and "build up" the line one field at a time with `+=` – geneSummons Feb 01 '16 at 22:54
1

Ignore the first line (the header):

List<String[]> allRows = reader.readAll()[1..-1];

and replace the '-' in the dates by splitting each row and editting the first:

allrows = allrows.collect{
    row -> row.split(',')[0].replace(',','') // the date
              + row.split(',')[1..-1]        // the rest
}

I don't know what you mean by "all dates at once". For me can only be iterated.

  • hi, i get this error: Error :null ; Message : No signature of method: [Ljava.lang.String;.split() is applicable for argument types: (java.lang.String) values: [,] Possible solutions: split(groovy.lang.Closure), wait(), sort(), tail(), toList(), wait(long)?? any reason why? – user175084 Jan 20 '16 at 16:41
  • What is the format of the row? Can you execute this where you run your code? ["a,c","b,d"].collect{ row -> row.split(',')}. If so the row.split should work, since is a Ljava.lang.String. – Walter Sobral Andrade Jan 20 '16 at 17:10