0

We are modeling account table in cassandra with social logins, we chose email as primary key and skinny row implementation. Our cassandra is on version 2.1.6. Here is table definition:

CREATE TABLE account_by_email (
    email_address text,
    account_password text,
    first_name text,
    last_name text,
    registered_at timestamp,
    roles set<text>,
    facebook_id text,
    twitter_id text,
    linkedin_id text,
    password_reset_token blob,
    password_reset_token_valid_until timestamp,
    profile_image_url text,
    PRIMARY KEY (email_address) ) WITH COMMENT='Accounts in system by email.';

This workks fine for email access since we can access fast each account when we know email address which is situation after login.

User has, in addition to email login option to login / signup with social accounts. When social account login is used flow is go to social network, receive social id (facebook, twitter, linkedin) and maybe email and query account table by social id to get full account or just email and continue on using email on each API request.

We currently added indexes on facebook_id, twitter_id, linkedin_id to support this since we are in MVP phase with one node and we chose fats implementation over performance.

What is proper way to model this? Here are couple of suggestions we are thinking of:

  • leave index implementation since fetch by social id is happening only on login once (after that email is used)
  • have one table for each social id which will hold social id email pair
  • have one table for each social id which will hold full account (account can be edited so this will add complexity to update)
  • something else?

And another question, is index implementation with high cardinality field (as social id) really that bad when you model access path which happens rarely?

Nenad Bozic
  • 3,724
  • 19
  • 45

1 Answers1

3

My take on this would be the following :

Create an account table that holds all information about the user, and uses a uuid as partition key :

CREATE TABLE account (
    userid uuid,
    first_name text,
    last_name text,
    registered_at timestamp,
    roles set<text>,
    password_reset_token blob,
    password_reset_token_valid_until timestamp,
    profile_image_url text,
    PRIMARY KEY (userid) );

Create a single table that links any of your login source to the user account :

CREATE TABLE account_by_login_source (
        user_external_id text, // Can be an email address or a social network id       
        login_source text,   // Can be any of "email", "facebook", "twitter",... 
        userid uuid,
        account_password text,  // only useful for email login, since you handle auth
        PRIMARY KEY ((user_social_id, login_source)));

When you create your user, generate a uuid, insert a row in the account table and a corresponding row in the account_login_source table.

This way, your users you can use multiple login sources and link them to a single account. You'll just have to run 2 very efficient queries in order to log the user in.

Using secondary indexes without specifying a partition key will definitely be a problem since requests will eventually timeout as your cluster grows. If you run queries like the following :

SELECT * FROM account_by_email where facebook_id = 'userid';

Cassandra will have to scan every node in the cluster, in order to get a single row. From experience, I advise not to use this technique which leads to a lot of despair once in production...

  • Only thing is that we use email 90% of the time (our access token translates to email so each API will read from it to verify user) so might be good addition to your suggestion to optimize email table or leave email as PK instead of userid since from social networks if we do not receive it we ask user to provide it so we always have it. – Nenad Bozic Aug 24 '15 at 09:14
  • And second table should probably have key ((user_social_id, login_source)) since it cannot be clustered for single social id – Nenad Bozic Aug 24 '15 at 09:15
  • Indeed the partition_key of the second table is (user_social_id, login_source), it was a typo. – Alexander DEJANOVSKI Aug 24 '15 at 09:26
  • It was unclear indeed that the email was always present. If so it can replace the uuid and serve as a unique id for your accounts. The second table is still needed as an inverted index to find which account (email address) corresponds to social account, since secondary indexes will eventually fail to answer fast enough. – Alexander DEJANOVSKI Aug 24 '15 at 09:28
  • Thanks for answer and conversation, needed to hear one more opinion. I had similar chain of thought... – Nenad Bozic Aug 26 '15 at 07:22