I had a single node (DataStax) Casandra cluster , in which I had to insert some 10gb of data from a file. I wrote a java program to read the file and store the data as foll :
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Date;
import com.datastax.driver.core.BoundStatement;
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.PreparedStatement;
import com.datastax.driver.core.Session;
public class Xb {
//cluster and session for cassandra connection
private static Cluster cluster;
private static Session session;
//variables for storing file elements
private static String taxid;
private static String geneid;
private static String status;
private static String rna_version;
private static String rna_gi;
private static String protein_version;
private static String protein_gi;
private static String gen_nuc_ver;
private static String gen_nuc_gi;
private static String start_gen_acc;
private static String end_gen_acc;
private static String orientation;
private static String assembly;
private static String mature_ver;
private static String mature_gi;
private static String symbol;
//Connecting the cassandra node(local host)
public static Cluster connect(String node){
return Cluster.builder().addContactPoint(node).build();
}
public static void main(String[] args) {
private static String symbol;
long lStartTime = new Date().getTime();
// TODO Auto-generated method stub
//call connect by passing localhost
cluster =connect("localhost");
session = cluster.connect();
//session.execute("CREATE KEYSPACE test1 WITH REPLICATION =" +"{'class':'SimpleStrategy','replication_factor':3}");
//session.createtable('genomics');
//use test1 : triggers the use of test1 keyspace
session.execute("USE test1");
//for counting the lines in the file
int lineCount=0;
try
{
//Reading the file
FileReader fr = new FileReader("/home/syedammar/gene2refseq/gene2refseq");
BufferedReader bf = new BufferedReader(fr);
String line;
//iterating over each line in file
while((line= bf.readLine())!=null){
lineCount++;
//splitting the line based on tab spaces
String[] a =line.split("\\s+");
System.out.println("Line Count now is ->"+lineCount);
//System.out.println("This is content"+line+" OVER HERE");
/*for(int i =0;i<a.length;i++){
System.out.println(i+"->"+a[i]);
}*/
//assigning the values to the corresponding variables
taxid =a[0];
geneid=a[1];
status=a[2];
rna_version=a[3];
rna_gi=a[4];
protein_version=a[5];
protein_gi=a[6];
gen_nuc_ver=a[7];
gen_nuc_gi=a[8];
start_gen_acc=a[9];
end_gen_acc=a[10];
orientation=a[11];
assembly=a[12];
mature_ver=a[13];
mature_gi=a[14];
symbol=a[15];
//Writing the insert query
PreparedStatement statement = session.prepare(
"INSERT INTO test.genomics " +
"(taxid, " +
"geneid, " +
"status, " +
"rna_version, " +
"rna_gi, " +
"protein_version, " +
"protein_gi, " +
"gen_nuc_ver, " +
"gen_nuc_gi, " +
"start_gen_acc, " +
"end_gen_acc, " +
"orientation, " +
"assembly, " +
"mature_ver, " +
"mature_gi," +
"symbol" +
") VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?);");
//create the bound statement and initialise it with your prepared statement
BoundStatement boundStatement = new BoundStatement(statement);
session.execute( // this is where the query is executed
boundStatement.bind( // here you are binding the 'boundStatement'
taxid,geneid,status,rna_version,rna_gi,protein_version,protein_gi,gen_nuc_ver,gen_nuc_gi,start_gen_acc,end_gen_acc,orientation,assembly,mature_ver,mature_gi,symbol));
}//end of while
} //end of try
catch(IOException e){
e.printStackTrace();
}
long lEndTime = new Date().getTime();
long difference = lEndTime - lStartTime;
int seconds = (int) (difference / 1000) % 60 ; //converting milliseconds to seconds
System.out.println("Elapsed seconds: " + seconds);
System.out.println("No of lines read are :"+ lineCount);
System.out.println("Record's entered into cassandra successfully");
session.close();
cluster.close();http://stackoverflow.com/editing-help
}//end of m}// end of class
This worked fine i got the records stored in Cassandra.
Now I have set up a 4 node Cassandra cluster , and I wanna do the same task of reading the same file and storing its content into the 4 node cluster.
My question is how would I do that, to which node I need to feed this program. How do i approach this ?
And my query is how would I establish connection with the 4 node cluster, what changes will I have to make in the above code. Like there would be some change in this part
public static Cluster connect(String node){
return Cluster.builder().addContactPoint(node).build();
}
what would be the changes , N to which node do I feed this program ? I am not clear how would it happen. Also let me know will it take the same amount of time to insert the entire data in 4 node cluster as it took for single node or will it be faster.
Thanks