0

I have setup single Cassandra node on VM. i have to create a table with 70000 columns. for this i have written java code that read json file and create table. here is my java code snippet. When i run my java code it throws exception after creation some columns. Exception stack is

public void createTable(String keyspaceName, String tableName) throws FileNotFoundException{
    JSONParser jsonParser = new JSONParser();
    FileReader fileReader;
    String filePath = "";
    String columnHeader = "";
    //String completeColumnHeader = "";
    try{
        System.out.println("Inside Create Table");
        session.executeAsync("DROP TABLE IF EXISTS "+keyspaceName+"."+tableName+";");
        String createQuery = "CREATE TABLE "+keyspaceName+"."+tableName +"(\"P:LanguageID\" text, "
                + "\"P:PdmarticleID\" text, PRIMARY KEY(\"P:PdmarticleID\",\"P:LanguageID\"));";
        session.execute(createQuery);
        System.out.println("Table created");
        filePath = "CassandraTableColumnHeader/FixColumnHeader.json";
        fileReader = new FileReader(filePath);
        JSONObject jsonObject = (JSONObject) jsonParser.parse(fileReader);
        JSONArray jsonArray = (JSONArray) jsonObject.get("columnHeaderName");

        int columnHeaderSize = jsonArray.size();

        int columnHeaderBatchSize = 1000;
        int fromIndex = 0;
        int toIndex = columnHeaderBatchSize;

        while(columnHeaderSize > 0){
            columnHeaderSize -=columnHeaderBatchSize;
            for(int i = fromIndex; i < toIndex; i++) {
                columnHeader = (String) jsonArray.get(i);
                if(columnHeader.equals("P:PdmarticleID")||columnHeader.equals("P:LanguageID")){
                    continue;
                }
                session.execute("ALTER TABLE "+keyspaceName+"."+tableName +" ADD "+"\""+columnHeader+"\""+" text;");
            }
            fromIndex = toIndex;
            if(columnHeaderSize < columnHeaderBatchSize){
                toIndex += columnHeaderSize;
            }else{
                toIndex = toIndex + columnHeaderBatchSize;  
            }
        }
    }catch(FileNotFoundException fnfe){
        throw fnfe; 
    }catch (ParseException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
} 

Exception in thread "main" com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.exceptions.DriverException: Host replied with server error: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.FileNotFoundException: C:\apache-cassandra-new\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-tmplink-ka-4839-Data.db (The process cannot access the file because it is being used by another process))) at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84) at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:265) at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:179) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:36) at com.exportstagging.SparkTest.DataLoaderInCassandra.createTable(DataLoaderInCassandra.java:89) at com.exportstagging.SparkTest.DataLoaderInCassandra.main(DataLoaderInCassandra.java:216) Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.exceptions.DriverException: Host replied with server error: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.FileNotFoundException: C:\apache-cassandra-new\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-tmplink-ka-4839-Data.db (The process cannot access the file because it is being used by another process))) at com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:216) at com.datastax.driver.core.RequestHandler.access$900(RequestHandler.java:45) at com.datastax.driver.core.RequestHandler$SpeculativeExecution.sendRequest(RequestHandler.java:276) at com.datastax.driver.core.RequestHandler$SpeculativeExecution$1.run(RequestHandler.java:374) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)

I have stuck here. Please help me. Thanks in advance.

Erick Ramirez
  • 13,964
  • 1
  • 18
  • 23
New User
  • 59
  • 1
  • 2
  • 4

1 Answers1

0

If I were you I might reevaluate creating a table with 70k column headers. Your partition key P:PdmarticleID and full primary key (P:PdmarticleID, P:LanguageID) are the only two pieces of information you will be able to use to get results anyway. So having these other pieces of information explicitly stored in columns is not buying you anything.

A collection (eg. map) can hold onto 64k items, with certain other limitations (see http://wiki.apache.org/cassandra/CassandraLimitations). Is there a way you can split the columns such that you can create multiple tables, with some pieces of information stored in one table and some in another?

Keith Nordstrom
  • 354
  • 2
  • 9