I'm trying to find the most effective way to multithread a bulk load of data into multiple tables within a keyspace in Cassandra from a Java program. Here's my Keyspace/Table declaration:
CREATE KEYSPACE IF NOT EXISTS articles WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : '3'}
CREATE TABLE IF NOT EXISTS articles.bigrams (docid text, bigram text, primary key (docid, bigram));
CREATE TABLE IF NOT EXISTS articles.unigrams (docid text, unigram text, primary key (docid, unigram));
And here is the portion of the Java program that is giving me issues. I'm trying to create 2 instances of QSQLSSTableWriter and write to each of them:
package cassandrabulktest.cassandra;
import java.io.IOException;
import java.util.ArrayList;
import org.apache.cassandra.exceptions.InvalidRequestException;
import org.apache.cassandra.io.sstable.CQLSSTableWriter;
public class UnigramLoader {
private static final String UNIGRAM_SCHEMA = "CREATE TABLE articles.unigrams (" +
"docid text, " +
"unigram text, " +
"PRIMARY KEY (unigram, docid))";
private static CQLSSTableWriter unigram_writer = CQLSSTableWriter.builder()
.inDirectory("/tables/articles/unigrams")
.forTable(UNIGRAM_SCHEMA)
.using("INSERT INTO articles.unigrams (docid, unigram) VALUES (?, ?)")
.build();
private static final String BIGRAM_SCHEMA = "CREATE TABLE articles.bigrams (" +
"docid text, " +
"bigram text, " +
"PRIMARY KEY (bigram, docid))";
private static CQLSSTableWriter bigram_writer = CQLSSTableWriter.builder()
.inDirectory("/tables/articles/bigrams")
.forTable(BIGRAM_SCHEMA)
.using("INSERT INTO articles.bigrams (docid, bigram) VALUES (?, ?)")
.build();
public static void load(String articleId, ArrayList<String> unigrams, ArrayList<String> bigrams) throws IOException, InvalidRequestException {
for (String unigram : unigrams) {
unigram_writer.addRow(unigram, articleId);
}
for (String bigram : bigrams) {
bigram_writer.addRow(bigram, articleId);
}
}
public static void closeWriter() throws IOException {
unigram_writer.close();
bigram_writer.close();
}
}
If it worked, this would start creating the SSTable files in 2 directories. However, I'm getting this error when running:
Exception in thread "Thread-1" java.lang.ExceptionInInitializerError
at edu.georgetown.cassandrabulktest.runnables.UnigramRunnable.run(UnigramRunnable.java:69)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch (found 662e2edf-c864-34a4-bca6-f83b25af6f6a; expected 7247b490-b141-11e4-a8f9-8b65543eda40)
at org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1125)
at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:337)
at org.apache.cassandra.io.sstable.CQLSSTableWriter$Builder.forTable(CQLSSTableWriter.java:360)
at edu.georgetown.cassandrabulktest.cassandra.UnigramLoader.<clinit>(UnigramLoader.java:29)
... 2 more
Caused by: org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch (found 662e2edf-c864-34a4-bca6-f83b25af6f6a; expected 7247b490-b141-11e4-a8f9-8b65543eda40)
at org.apache.cassandra.config.CFMetaData.validateCompatility(CFMetaData.java:1208)
at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:1140)
at org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1121)
... 5 more
Is there no way to do this, or is there a different way to accomplish what I want to do? Thanks in advance!