0

I have a large size CSV file that is almost 60MB. This CSV file contains Object_Name, timestamp, and value.

In the CSV file, there are 10 objects that are listed in csv file based on time sequence, but those are overlap. such as :

A1,2013-08-24 15:36:47,24.83
A2,2013-08-24 15:36:47,26.56
A3,2013-08-24 15:36:47,25.83
A6,2013-08-24 15:36:47,-40
A8,2013-08-24 15:36:47,-40
A9,2013-08-24 15:36:47,-40
B2,2013-08-24 15:36:47,6
C1,2013-08-24 15:37:18,6

I want to classfy those records by object_name. If the size of file is small, I can do it. In this situation, I spend 10 mins to read the csv file. I could not image to classify the data, probably crash my laptop. The expected results are 10 list, each of them contain only one object with timestamp and value, such as,

Object_Name,timestamp,val
A1,2013-08-24 15:00:00,26.7
   .....
   .....

Could someone help me? Basically, I just want to know a effective way which sorts these data by object name and separates from it.

BTW, I use opencsv to read csv file.

Thank you.

Eric
  • 1,271
  • 4
  • 14
  • 21
  • 2
    It's not at all clear what you are asking. – Robert Harvey Aug 28 '13 at 19:55
  • 1
    What do you mean by "classify"? – Jim Garrison Aug 28 '13 at 20:05
  • Hi Jim, I want to know how to sort by object name, and separate it from huge CSV file. – Eric Aug 28 '13 at 20:22
  • I use CSVReader, so all data from csv are read into List listString = reader.readAll(). The next thing is separate each object by sorting object_name. My solution is using 10 List store 10 object which contains name, timestamp, val. – Eric Aug 28 '13 at 20:28
  • You could dump the data into a database, and then query it. With such a large dataset this could be the most efficient way. – Martin Wickham Aug 28 '13 at 21:12
  • Have you tried using the `sort` command on Linux? There may be a sort in Linux-like environments for Windows, like Cygwin. `sort --field-separator="," --key=1 outfile ` Once you have a sorted file, it is no longer necessary to slurp the entire file into memory, you could use a streaming model to split the file into files based on the first key. – Paul Aug 29 '13 at 10:02
  • Done it. First, Read the csv file into memory, then do For loop separate different object, after that insert these data into different table in database. – Eric Sep 04 '13 at 13:33

0 Answers0