1

This is more of a discussion.

We have a system which is multitenanted and will have tables that can have millions of rows. Our UI allows users to perform searches against these tables with many different search criterias -- so they can have any combination of these criteria.

It is not practical to index all these search columns in the database or to load the full tables in memory and then filter.

Can anybody point me in the correct direction for patterns/designs that tackles this issue?

Matt Fenwick
  • 48,199
  • 22
  • 128
  • 192
Amitesh
  • 677
  • 1
  • 10
  • 24
  • This is rather an opened ended question. In practice, without understanding your domain, I would ask if you could redefine the scope. Can the UI be made to guide the user to perform searches that you do have indexes for. Rather than let the user do any kind of `adhoc` query, you could get the user to engage with you to discuss their requirements and tailor the data and indexes accordingly. – Stephen Quan Feb 12 '12 at 22:20
  • Here is an example: Users can search for invoices. They have the ability to search on Invoice number, Invoice date, Invoice Job Number, Customer of tghe Invoice, Supplier of the Invoice, Status of the Invoice(Paid, Void, etc), Invoice Payment date. There is no guide to the search. They can select/fill any of the critera and click search – Amitesh Feb 12 '12 at 22:35

1 Answers1

0

I'm not aware of any pattern to solve the situation you describe. An unlimited number of rows, completely ad hoc queries, many simultaneous users? That's not a requirement; that's "anything goes".

I'll assume that this is a reporting database, not transactional. The data is read-only. Is that correct?

A data warehouse with a star schema would prescribe queries along thoughtfully designed dimensions. Users can roll up dimensions (e.g. time dimension allows users to roll up to day, week, month, quarter, year, etc.). But the logic to do so is executed on the database and coded in stored procedures.

I'd challenge the assertion that users need millions of rows on the middle tier. No user can take in millions of rows at once. Google returns the millions of pages returned by a single query twenty-five at a time.

Maybe you can stream down data sets that are used in a detached manner, analyzed using Excel or some other tool. But that's the best scenario I can envision.

duffymo
  • 305,152
  • 44
  • 369
  • 561
  • Hi Thanks for the reply. This is not a reporting system only. So it has live transactions going on it as well. Also you are correct in saying that the user will never work with a mill rows but the problem I was describing was querying a table with more than a million rows. This query can be a combination of multiple different columns which are not necessarly to be indexed – Amitesh Feb 12 '12 at 22:21
  • Querying a million rows without indexes will never perform. And those people doing transactions will suffer as well. You're screwed. – duffymo Feb 12 '12 at 22:32
  • Is the answer that users should not have the flexability to search with so many criterias – Amitesh Feb 12 '12 at 22:45
  • Maybe. Your question is ill-posed. It's so open-ended that it's impossible to answer. Maybe that's why you're having such trouble designing it. You have no real requirements or constraints. – duffymo Feb 12 '12 at 22:50
  • If you read my comment I made to Matt Fenwick (above), I have given an example of how users use the system – Amitesh Feb 12 '12 at 23:12
  • I read it. That does not seem like an insurmountable problem. I can easily see appropriate indexes. And the searches are structured: I can envision a drop down in a UI that would spell out the column on which to search. The fact that you insist that it's impossible suggests to me that either you're overstating the difficulties or oversimplifying the example you gave. Which is it? – duffymo Feb 13 '12 at 00:48