1

I have a problem with Hector's handling of control-characters in Key and Column names. I am writing a program using Hector to talk with a Cassandra instance, and there are pre-existing Keys and Column names with e.g. hexadecimal "594d69e0b8e611e10000242d50cf1ff7".

I have inputted that hexadecimal into a Java String and plugged it through some simple conversion-to-text code:

StringBuilder sb = new StringBuilder();
for (int i = 0; i < s1.length() - 1; i+=2 ){
    /*Grab the hex in pairs*/
    String output = s1.substring(i, (i + 2));
    /*Convert Hex to Decimal*/
    int decimal = Integer.parseInt(output, 16);                  
    sb.append((char)decimal);              
}            
return sb.toString();

(Converting the returned Java String back to hexadecimal by calling hexString.append(Integer.toHexString(textString.charAt(i))); for every character, returns the original hexadecimal, so Java should be capable of handling this data.) Printing said Java String yields the top line in the below image:

[Image not posted because new users aren't allowed to post images.] Image here: https://i.stack.imgur.com/yUJxs.png

Unfortunately, the bottom line (corrupted) is what Hector is returning to me when I call the following code (lots of checks and setup omitted, for simplicity of the question):

OrderedRows<String, String, String> orderedRows;
orderedRows = rangeSlicesQuery.execute().get();
Row<String,String,String> lastRow = orderedRows.peekLast();
for (Row<String, String, String> r : orderedRows) {
    String key = r.getKey();
    System.out.println(key);
...

So, Hector is not handling control characters properly when returning the Java String. How can I get Hector to return to me the Keys and Columns in Hexadecimal instead of a (corrupted) text-based Java String? I tried to look it up but the documentation on how to do so is essentially is missing (http://hector-client.github.com/hector//source/content/API/core/1.0-1/me/prettyprint/hector/api/beans/OrderedRows.html - what are K, V, and N?). I imagine it should be simple, as the Cassandra CLI assumes hexadecimal if you do not wrap the input with ascii(''), but I cannot figure out how to do it.

Duke
  • 11
  • 1

1 Answers1

0

In Cassandra, everything is stored as hex bytes. The Cassandra thrift API also accepts binary. In real life however, people like to deal with human types like String, integer etc. Hector makes it easy for you to use the thrift API by abstracting out the serializing/deserializing logic.

K, N and V are types of the row key, column name and column value respectively. When you use String, String, String, you are telling hector that all the three types for your column family are Strings.

If you are storing the row key and column names as Bytes, you should use byte[] instead for retrievals and BytesArraySerializer for serializing.

Mohit
  • 239
  • 1
  • 9