0

We're exploring options for reliably segregating customer data in Spanner. The most obvious solution is a customer per database, but the 100 database/instance limitation renders that impractical. Past experience leads me to be very suspicious of any plan to add a customer-id field to the primary key of each table, because it's far too easy to screw that up in SQL queries, leading to dangerous data cross-talk.

I'm considering weird solutions like using all 2k tables/instance, and taking the ~32 tables we need per customer and prefixing those. E.g., [cust-id]-Table1, [cust-id]-Table2, etc. At least then the customer segregation logic that needs to be iron-clad can be put in one place that's hard to screw up in queries. But is anyone aware of a less weird approach? E.g., "100" is a suspiciously-non-round number in a technical limitation -- is that adjustable somehow?

Maxim
  • 4,075
  • 1
  • 14
  • 23
Joel Webber
  • 222
  • 1
  • 8

1 Answers1

0

Unfortunately, 100 databases/instance is not an adjustable value.

Though, I don't seem to fully understand " very suspicious of any plan to add a customer-id field to the primary key of each table, because it's far too easy to screw that up in SQL queries, leading to dangerous data cross-talk." Are you concerned about query performance, data correctness, code correctness or schema ?

With this schema, ~32 tables per customer will only allow you to store ~6000 customers. Though I would suggest benchmarking with other schema choices Spanner exposes.

Would you be able to provide a high-level schema of these customer tables as well as your query patterns ?

Also, suggest to read into for more ideas that fit your usecase better:

snehashah
  • 161
  • 3
  • This is a pretty common issue in multi-tenant databases. It's maximally important to avoid mixing customer data, but consider a schema like `create table User ( custId int64, email string, ... ) primary key (custId, email)` (where `custId` is the customer tenant, and `email` is _their_ end user). Any query like `select User.* from User where email [...]` will inadvertently mix customer data. It's exceedingly difficult to guarantee that this never gets screwed up in arbitrary queries. Hence our desire to have stronger segregation. – Joel Webber Aug 13 '17 at 14:51
  • Sorry that was a little dense. SO comments are pretty limiting. To be clear, I've seen this kind of structure blow up in the face of solid engineering teams, so it's not a theoretical concern at all. I'm open to other suggestions (I've read all the Spanner docs, and seen nothing addressing this question so far). Regarding the 6k customers, yes, we'd have to create multiple instance/clusters to deal with overflow. Which I don't love, but I'm not coming up with much else. – Joel Webber Aug 13 '17 at 14:53