17

I am looking for a code example to retrieve all rows and all columns of a column family. Something like:

SELECT * FROM MyTable

I see that this can be done using a RangeSlicesQuery, but you still have to provide a certain range. And I think you have to specify the column names too. Is there a clean and safe way to do this?

Using Hector 1.0 and Cassandra 1.0.

J. Volkya
  • 993
  • 3
  • 14
  • 33

2 Answers2

15

Try something like this:

public class Dumper {
    private final Cluster cluster;
    private final Keyspace keyspace;

    public Dumper() {
        this.cluster = HFactory.getOrCreateCluster("Name", "hostname");
        this.keyspace = HFactory.createKeyspace("Keyspace", cluster, new QuorumAllConsistencyLevelPolicy());
    }

    public void run() {
        int row_count = 100;

        RangeSlicesQuery<UUID, String, Long> rangeSlicesQuery = HFactory
            .createRangeSlicesQuery(keyspace, UUIDSerializer.get(), StringSerializer.get(), LongSerializer.get())
            .setColumnFamily("Column Family")
            .setRange(null, null, false, 10)
            .setRowCount(row_count);

        UUID last_key = null;

        while (true) {
            rangeSlicesQuery.setKeys(last_key, null);
            System.out.println(" > " + last_key);

            QueryResult<OrderedRows<UUID, String, Long>> result = rangeSlicesQuery.execute();
            OrderedRows<UUID, String, Long> rows = result.get();
            Iterator<Row<UUID, String, Long>> rowsIterator = rows.iterator();

            // we'll skip this first one, since it is the same as the last one from previous time we executed
            if (last_key != null && rowsIterator != null) rowsIterator.next();   

            while (rowsIterator.hasNext()) {
              Row<UUID, String, Long> row = rowsIterator.next();
              last_key = row.getKey();

              if (row.getColumnSlice().getColumns().isEmpty()) {
                continue;
              }


              System.out.println(row);
            }

            if (rows.getCount() < row_count)
                break;
        }
    }

    public static void main(String[] args) {
        new Dumper().run();
    }
}

This will page through the column family in pages of 100 rows. It will only fetch 10 columns for each row (you will want to page very long rows too).

This is for a column family with uuids for row keys, strings for column names and longs for values. Hopefully it should be obvious how to change this.

Crowie
  • 3,220
  • 7
  • 28
  • 48
tom.wilkie
  • 2,846
  • 20
  • 16
  • Thanks for your answer. But this is what I have done. I simply set rangeSlicesQuery.setKeys("", "") and I do not set any row count. This returned all the rows in the column family. It seems there is no need to page through the columns. – J. Volkya Dec 07 '11 at 17:46
  • To continue with my previous comment, to do it like that, I needed to specify the column names. – J. Volkya Dec 07 '11 at 17:54
  • 3
    I pretty sure Hector does not implements paging for you. Your code will likely fail with a timeout (or worse, cause Cassandra to OOM) when you dataset gets larger, as doing what you suggest causes Cassandra to load the entire dataset into RAM. – tom.wilkie Dec 07 '11 at 18:23
  • 1
    This might only work with order preserving partitioner. So how can you do it with RandomPartitioner? – piotrga Jun 08 '12 at 10:26
  • 1
    We tried it with 100k rows and eventually started to timeout. – Jake Pearson Jun 25 '12 at 17:58
2

Try this out:

    int rowCount = MAX;
    RangeSlicesQuery<String, String, String> rangeSlicesQuery = HFactory
            .createRangeSlicesQuery(keyspace2, STRINGSERIALIZER,
                    STRINGSERIALIZER, STRINGSERIALIZER)
            .setColumnFamily(columnFamily)
            .setRange(null, null, false, rowCount).setRowCount(rowCount);
    String lastKey = null;
    // Query to iterate over all rows of cassandra Column Family
    rangeSlicesQuery.setKeys(lastKey, null);
    QueryResult<OrderedRows<String, String, String>> result = rangeSlicesQuery
            .execute();
    OrderedRows<String, String, String> rows = result.get();
    for (Row<String, String, String> row : rows) {
        String cassandra_key = row.getKey();
    }

}
Manuel
  • 3,828
  • 6
  • 33
  • 48
Jimmy
  • 2,165
  • 1
  • 17
  • 13