0

I'm storing photos in a list cql3 column. I can query the list easily from cql3 but I also need to understand how the Cassandra storage model deals with lists to be able to use the JMX bulkLoad service to get my data into Cassandra. If I insert some test data into a list like this:

insert into dat.lgr (id, photos) values (0, [0xaa, 0xbb]);

The resulting data, when queried with the cli looks like this:

=> (column=photos:2fce75c0fe9811e2ab248b7126053a99, value=aa, timestamp=1375794036508000)
=> (column=photos:2fce75c1fe9811e2ab248b7126053a99, value=bb, timestamp=1375794036508000)

So it looks like Cassandra is actually storing a column for each element in the list, identified by a composite column name consisting of the collection name and an unknown hex number. The number is likely a 64 bit hash, or two 32 bit hashes appended together. But what's been hashed? I've looked through the source code but found nothing. Any help appreciated.

axle_h
  • 553
  • 1
  • 5
  • 16

1 Answers1

2

I'd suggest that column names for list items are UUIDs. At least both these values represent valid date "Tuesday, August 6, 2013 1:00:36 PM GMT" (try ""2fce75c0-fe98-11e2-ab24-8b7126053a99" in http://www.famkruithof.net/uuid/uuidgen for example).

It's easy to verify - just truncate the table and repeat the same statement. You would get completely different column names for the same data if my guess is correct.

Wildfire
  • 6,358
  • 2
  • 34
  • 50
  • Yep, they are time UUID's. Thanks. I found the code that generates them in https://github.com/apache/cassandra/blob/cassandra-1.2/src/java/org/apache/cassandra/cql3/Lists.java so I've got something decent to work from. However, since the comparator for my cf is just UTF8, I'm getting errors from AbstractSSTableSimpleWriter when I try adding a column with comparator UTF8:TimeUUID. Any idea? – axle_h Aug 07 '13 at 08:18
  • @axle_h I'd say that only a binary comparator would work. BTW, this question might be of interest for you: http://stackoverflow.com/questions/18071334/selecting-index-from-cassandra-list-collection. Basically, if you will use the list, you won't be able to select only a single item from it. I think, it would be better to use some explicit ID of photo as a part of primary key (it might be still TimeUUID). It would allow to SELECT, UPDATE or DELETE any item by its ID. – Wildfire Aug 07 '13 at 08:45
  • It's alright, I've sorted it. Used a binary comparator. Thanks for your help. Also had a completely unrelated problem to do with trying to use a compositetype for a non-composite row key. – axle_h Aug 07 '13 at 09:34
  • Had a look at the question too. My use case is to pull all photos for a given ID but sometimes would need to delete a photo by id, so I'm covered with a list. – axle_h Aug 07 '13 at 09:40
  • Or not... The protocol only supports a list with elements of [short bytes] https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=doc/native_protocol.spec;hb=refs/heads/cassandra-1.2 Not suitable for JPEG blobs. – axle_h Aug 07 '13 at 11:33