0

I need to Insert records into Cassandra ,so I wrote a function whose input is a csv file. Say the csv file's name is test.csv. In Cassandra I have a table test. I need to store each row of the csv file into the test table. Since I am using spark java api , I am also creating a POJO class or DTO class for mapping the fields of the Pojo and Columns of Cassandra.

The Problem here is test.csv is having some 50 comma seperated values that has to be stored in 50 columns of test table in cassandra which having is total of 400 columns. So In my test POJO class I created a constructor of those 50 fields.

JavaRDD<String> fileRdd = ctx.textFile("home/user/test.csv");
JavaRDD fileObjectRdd = fileRdd.map(
            new Function<String, Object>() {

//do some tranformation with data

switch(fileName){
case "test" :return  new TestPojo(1,3,4,--50); //calling the constructor with 50 fields .

}
});

switch(fileName){
test : javaFunctions(fileObjectRdd).writerBuilder("testKeyspace", "test", mapToRow(TestPojo.class)).saveToCassandra();

}

So here I am always returning the Object of the TestPojo class of each row of the test.csv file to an Rdd of Objects . Once that is done I am saving that rdd to the Cassandra Table Test using the TestPojo Mapping.

My Problem is In future if the test.csv will have say 60 columns , that time my code will not work because I am invoking the Constructor with only 50 fields.

My Question is how do I create a constructor with all the 400 fields in the TestPojo, so that no matter how many fields the test.csv has My code should be able to handle it.

I tried to create a general Constructor with all 400 fields but ended up with a compilation error saying the limit is only 255 fields for the constructor params.

or is there any better way to handle this use case ??

Question 2 : what if the data from test.csv is going to multiple tables in cassandra say 5 cols of test.csv going to test table in cassandra and 5 other cols are going to test2 table in cassandra .

Problem here is when I am doing

JavaRDD fileObjectRdd = fileRdd.map(
        new Function<String, Object>() {

//do some tranformation with data

switch(fileName){
case "test" :return  new TestPojo(1,3,4,--50); //calling the constructor     with 50 fields .

}
});

I am returning only one Object of TestPojo. In case the data from test.csv is going to test table and test2 table , I will need to return 2 objects one of TestPojo and another of Test2Pojo.

Erick Ramirez
  • 13,964
  • 1
  • 18
  • 23
Syed Ammar Mustafa
  • 373
  • 1
  • 7
  • 18

0 Answers0