-3

I need to aggregate CSV data with group by in Java.

My csv file looks like this:

Numero, NumeroWsn, NoeudAdress, PacketRece, NoeudsRece, Hello
1436136640477044,wsn430-8,NA:b27b,Packet recevied from,RX: b0b4, Hello #33
1436136640477257,wsn430-8,NA:b27b,Packet recevied from,RX: b986, Hello #33
1436136640477415,wsn430-8,NA:b27b,Packet recevied from,RX: bc2d, Hello #33
1436136640477566,wsn430-8,NA:b27b,Packet recevied from,RX: b36b, Hello #34
1436136640477716,wsn430-8,NA:b27b,Packet recevied from,RX: bcb6, Hello #35
1436136640477995,wsn430-9,NA:bc2d,Packet recevied from,RX: 1f9e, Hello #33
1436136640478162,wsn430-9,NA:bc2d,Packet recevied from,RX: be29, Hello #33
1436136640478313,wsn430-9,NA:bc2d,Packet recevied from,RX: b61a, Hello #32
1436136640478462,wsn430-9,NA:bc2d,Packet recevied from,RX: c735, Hello #32
1436136640478612,wsn430-9,NA:bc2d,Packet recevied from,RX: bb0a, Hello #32
1436136640478760,wsn430-9,NA:bc2d,Packet recevied from,RX: b6bc, Hello #33
1436136640477044,wsn430-8,NA:b27b,Packet recevied from,RX: b0b1, Hello #42
1436136640477257,wsn430-8,NA:b27b,Packet recevied from,RX: b984, Hello #44

Is there a way to aggregate these data by grouping by NoeudAdress and show NoeudsRece counters as column like shown below using Java?

NoeudsAdresse,NumberOfNoeudsRece

b27b ,7
bc2d ,6

I thought of loading CSV file to a List using OpenCSV, but is it efficient for a csv file with millions of rows?

Shreyas Chavan
  • 1,079
  • 1
  • 7
  • 17
  • 1
    If non-java solutions are ok, this can be done on command line like this: cat FileWithData | awk -F, '{print $3}' | awk -F: '{print $2}' | sort | uniq -c – ramana_k Jul 16 '15 at 23:05

2 Answers2

0

You can read data from the file line by line, split on comma and save the individual values in a String array and then create a Hashmap with the key as the index corresponding to NoeudAdress and the value as an Arraylist of the column you need a counter for. The counter will be size of the corresponding Arraylist in this case.

Edit: This is a conditional variation of importing the entire csv by OpenCSV. Instead of loading entire csv in memory, we are only importing the data we need and in the specific format that we need. This will perform better then the crude approach.

Shreyas Chavan
  • 1,079
  • 1
  • 7
  • 17
0

Use H2 instead of OpenCSV.

Strip out your header line and put it in a file called DATA.CSV

1436136640477044,wsn430-8,NA:b27b,Packet recevied from,RX: b0b4, Hello #33
1436136640477257,wsn430-8,NA:b27b,Packet recevied from,RX: b986, Hello #33
1436136640477415,wsn430-8,NA:b27b,Packet recevied from,RX: bc2d, Hello #33
1436136640477566,wsn430-8,NA:b27b,Packet recevied from,RX: b36b, Hello #34
1436136640477716,wsn430-8,NA:b27b,Packet recevied from,RX: bcb6, Hello #35
1436136640477995,wsn430-9,NA:bc2d,Packet recevied from,RX: 1f9e, Hello #33
1436136640478162,wsn430-9,NA:bc2d,Packet recevied from,RX: be29, Hello #33
1436136640478313,wsn430-9,NA:bc2d,Packet recevied from,RX: b61a, Hello #32
1436136640478462,wsn430-9,NA:bc2d,Packet recevied from,RX: c735, Hello #32
1436136640478612,wsn430-9,NA:bc2d,Packet recevied from,RX: bb0a, Hello #32
1436136640478760,wsn430-9,NA:bc2d,Packet recevied from,RX: b6bc, Hello #33
1436136640477044,wsn430-8,NA:b27b,Packet recevied from,RX: b0b1, Hello #42
1436136640477257,wsn430-8,NA:b27b,Packet recevied from,RX: b984, Hello #44

Go download h2 jar file from here: http://www.h2database.com/html/download.html

And then run this code ...

import java.io.File;
import java.net.URISyntaxException;
import java.net.URL;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

public class CSVLoader {

  public static final String getApplicationPath(Class<?> mainClass) throws URISyntaxException {
    return  getApplicationDirectory(mainClass).getAbsolutePath();
  } 

  public CSVLoader() {
    executeStatement(getDropTableStatement(), false);
    executeStatement(getCreateTableStatement(), false);
    executeStatement(getInsertStatement(), false);
    executeStatement(getSelectStatement(), true);
  }

  public static final String getDropTableStatement() {
        String SQLString = "DROP TABLE DATA IF EXISTS;\n";
        return SQLString;
  }

  public static final String getSelectStatement() {
        String SQLString = "SELECT NOEUDADRESS, COUNT(NOEUDSRECE) FROM DATA GROUP BY NOEUDADRESS;\n";
        return SQLString;
  }

  public static final String getCreateTableStatement() { 
        String SQLString = "CREATE TABLE DATA(\n";
      SQLString += "   NUMERO         VARCHAR(100),\n";
      SQLString += "   NUMEROWSN      VARCHAR(100),\n";    
      SQLString += "   NOEUDADRESS    VARCHAR(100),\n";  
      SQLString += "   PACKETRECE     VARCHAR(100),\n";  
      SQLString += "   NOEUDSRECE     VARCHAR(100),\n";   
      SQLString += "   HELLO          VARCHAR(100))";

        return SQLString;
  }

  public static final String getInsertStatement() {
    return "INSERT INTO DATA SELECT * FROM CSVREAD('DATA.CSV')";
  } 

  public void executeStatement(String sql, boolean withResultSet) {
    Connection connection = null;
    Statement statement = null;
    ResultSet resultSet = null;

    try {
      File file = getApplicationDirectory(CSVLoader.class);
      Class.forName("org.h2.Driver");
          connection = DriverManager.getConnection("jdbc:h2:" + file.getAbsolutePath() + File.separator + "storage", "sa", "secret");
          statement = connection.createStatement();

          if(withResultSet) {
              resultSet = statement.executeQuery(sql);
              while(resultSet.next()) {
                System.out.println("-->" + resultSet.getString(1) + "\t" + resultSet.getString(2));
              }
          }
          else {
              statement.execute(sql);
          }
      }
    catch (URISyntaxException e) {
      e.printStackTrace();
    } 
    catch (ClassNotFoundException e) {
      e.printStackTrace();
    } 
    catch (SQLException e) {
      e.printStackTrace();
    }
    finally {
      try {
        if(resultSet != null) {
          resultSet.close();
        }       
          if(statement != null) {
            statement.close();
        }
          if(connection != null) {
          connection.close();
        }
      }
      catch (SQLException e) {
        e.printStackTrace();
        statement = null;
        connection = null;        
      }
    }
  }

  private static final File getApplicationDirectory(Class<?> mainClass) throws URISyntaxException {
    URL url = mainClass.getProtectionDomain().getCodeSource().getLocation();
    File file = new File(url.toURI());
    return file.getParentFile();
  }

  public static void main(String[] args) {
    new CSVLoader();
  }
}

Output is

-->NA:b27b  7
-->NA:bc2d  6

You can also run it from console: The example above also works when you enter this in console:

CREATE TABLE TEST( 
    NUMERO         VARCHAR(100),
    NUMEROWSN      VARCHAR(100),
    NOEUDADRESS    VARCHAR(100),
    PACKETRECE     VARCHAR(100),  
    NOEUDSRECE     VARCHAR(100),   
    HELLO          VARCHAR(100)
) AS SELECT * FROM CSVREAD('C:\\ECLIPSE\\WORKSPACE\\H2\\DATA.CSV')

enter image description here

Make sure to use full path to your data file

Constantin
  • 1,506
  • 10
  • 16
  • i have same question i use Importing Data from a CSV File like that: – oualid walid Jul 19 '15 at 23:30
  • CREATE TABLE TEST AS SELECT * FROM CSVREAD('test.csv'); CREATE TABLE TEST(ID INT PRIMARY KEY, NAME VARCHAR(255)) AS SELECT * FROM CSVREAD('test.csv'); – oualid walid Jul 19 '15 at 23:30
  • Is it working for you, then? I tested it with your data and it works – Constantin Jul 19 '15 at 23:32
  • first step: i downloading h2 jar file (h2 console) second step:importing csv file in generic ..in h2 console it is Generic H2 (Embedded) – oualid walid Jul 19 '15 at 23:42
  • I'm sorry, but how I can import my csv file into h2 console I tried several times but the same problems , again thank you infinitely – oualid walid Jul 20 '15 at 00:07
  • CREATE TABLE TEST AS SELECT * FROM CSVREAD('C:\Users\walid-430\Desktop\fout.csv'); Update count: 0 (16 ms) CREATE TABLE TEST(ID INT PRIMARY KEY, NAME VARCHAR(255)) AS SELECT * FROM CSVREAD('C:\Users\walid-430\Desktop\fout.csv'');; Syntax error in SQL statement "CREATE TABLE TEST(ID INT PRIMARY KEY, NAME VARCHAR(255)) AS SELECT * FROM CSVREAD('C:\Users\walid-430\Desktop\fout.csv'[*]');"; SQL statement: CREATE TABLE TEST(ID INT PRIMARY KEY, NAME VARCHAR(255)) AS SELECT * FROM CSVREAD('C:\Users\walid-430\Desktop\fout.csv''); [42000-187] 42000/42000 (Help) – oualid walid Jul 20 '15 at 00:17
  • Thank you for your patience with me,thank you so much, Constantin I have questions about errors in h2 console – oualid walid Jul 20 '15 at 07:40
  • IO Exception: "IOException reading C:\USERS\\walid-430\\WORKSPACE\\H2\\DATA.CSV"; SQL statement: CREATE TABLE TEST( NUMERO VARCHAR(100), NUMEROWSN VARCHAR(100), NOEUDADRESS VARCHAR(100), PACKETRECE VARCHAR(100), NOEUDSRECE VARCHAR(100), HELLO VARCHAR(100) ) AS SELECT * FROM CSVREAD('C:\USERS\\walid-430\\WORKSPACE\\H2\\DATA.CSV') Update count:0 [90028-187] 90028/90028 (Help) org.h2.jdbc.JdbcSQLException: IO Exception: "IOException reading C:\USERS\\walid-430\\WORKSPACE\\H2\\DATA.CSV"; SQL statement: ] – oualid walid Jul 20 '15 at 07:44
  • and for eclipse console i have same think like this: – oualid walid Jul 20 '15 at 07:45
  • org.h2.jdbc.JdbcSQLException: IO Exception: "IOException reading DATA.CSV"; SQL statement: INSERT INTO DATA SELECT * FROM CSVREAD('DATA.CSV') [90028-187] at org.h2.message.DbException.getJdbcSQLException(DbException.java:345) at org.h2.message.DbException.get(DbException.java:168) at org.h2.tools.Csv.convertException(Csv.java:598) at org.h2.tools.Csv.read(Csv.java:197) – oualid walid Jul 20 '15 at 07:46
  • double check your path, its wrong ... you're missing a slash ... Again, look at my example – Constantin Jul 20 '15 at 10:23