The official documentation tells us to not use UDTs for primary keys. Is there a particular reason for this? What would the potential downsides be in doing this?
-
The bottom of this doc seems to contradict what you're saying: https://www.datastax.com/documentation/cql/3.1/cql/cql_reference/cqlRefcreateType.html. Sylvain Lebresne says, "It is possible to use a UDT as type of any CQL column, including clustering ones. In that latter case, the ordering induced by the UDT is the one of it's fields in the order they have been declared. Please note however that there is relatively little advantages to be gain in using a UDT on a PRIMARY KEY column, avoid abusing such possibility just because it's available." http://www.datastax.com/dev/blog/cql-in-2-1 – catpaws Oct 21 '14 at 23:52
-
"Please note however that there is relatively little advantages to be gain in using a UDT on a PRIMARY KEY column, avoid abusing such possibility just because it's available." was what I was referring to. I guess having keys ordered by first fieldname can cause unexpected issues. – ashic Oct 22 '14 at 08:39
1 Answers
That sentence was intended to discourage users from using UDT for PK columns indiscriminately. The main motivation for UDT in it's current incarnation (that is, given that Cassandra supports the "frozen" UDT) is for storing more complex values inside collections. Outside collections, UDT can have it's uses, but it's worth asking yourself twice if you need it. For example:
CREATE TYPE myType (a text, b int);
CREATE TABLE myTable (id uuid PRIMARY KEY, v frozen<myType>);
is often not very judicious in that you lose the ability of updating v.a without also updating v.b. So that it's actually more flexible to directly do:
CREATE TABLE myTable (id uuid PRIMARY KEY, a text, b int);
This trivial example points out that UDT outside of collections is not necessarily a good thing, and this also extends to primary key columns. It's not necessarily better to do:
CREATE TYPE myType (a text, b int);
CREATE TABLE myTable (id frozen<myType> PRIMARY KEY);
than more simply:
CREATE TABLE myTable (a text, b int, PRIMARY KEY ((a, b)))
Furthermore, regarding the primary key, any complex UDT probably doesn't make sense. Consider even a moderately complex type like:
CREATE TYPE address (
number int,
street text,
city text,
phones set<text>
)
Using such a type inside a primary key almost surely isn't very useful since the PK identifies rows and so 2 addresses that are the same except for the set of phones wouldn't identify the same row. There are not many situations where that would be desirable. More generally, a PK tends to be relatively simple, and you might want to have fine-grained control over the clustering columns, and so UDT are rarely good candidates.
In summary, UDT in PK columns are not always a bad, just not often useful in that context, and so users should not be looking hard at ways to use UDT for PK columns just because it's allowed.

- 2,263
- 16
- 18