5

I need to store records about user locations based on IP addresses but Im not sure how best to model it.

For each address, we need to record the machine's details (ipaddress, agentId) and the machine's location (isocode, city). This information will never be updated - only inserted and read.

Queries on this data will need to pull location info for a specific user in a given time period.

Traditionally I would model this using a wide-row CF with JSON blobs, along the lines of:

CREATE TABLE user_location (
userid text,
timestamp timeuuid,
data text, -- json blob {agentid, isocode, city, ipaddress}
PRIMARY KEY (userid, timestamp)
);

Im now questioning whether this is the best approach, and whether I should instead be replacing the JSON with a user defined type (UDT), such as:

CREATE TYPE machinelocation (
isocode text,
city text,
ipaddress inet
);

CREATE TABLE user_location (
userid text,
timestamp timeuuid,
machinelocations map<text, machinelocation>
PRIMARY KEY (userid, timestamp)
);

or should I just do away with the blob entirely and separate out the json into dedicated columns, ala:

CREATE TABLE user_location (
userid text,
timestamp timeuuid,
agentid text,
isocode text,
city text,
ipaddress text,
PRIMARY KEY (userid, timestamp)
);

What is the recommended approach for modelling data of this type?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
beterthanlife
  • 1,668
  • 2
  • 18
  • 30

1 Answers1

2

I would go with the separate columns, unless you really will always be pulling the full blob. Even then I would probably still go with separate columns. The best use case I see for UDT's is to be able to put them into collections, such that you can have a collection with multiple fields per item.

Zanson
  • 3,991
  • 25
  • 31
  • Thanks for the response. Can you think of any downside to separating them out into dedicated columns, other than the eventual column limit (which I wont hit anyway)? – beterthanlife May 15 '14 at 10:33
  • Cassandra will have to do a little more work server side in the dedicated column case, so in a toy scenario that overhead may show as it using more cpu. But it shouldn't make a difference with any real workload, and gives you the benefits of being able to do more than just retrieve the whole blob. – Zanson May 15 '14 at 15:11