I've just starated devising an ElasticSearch mapping for a multitenant web app. In this app, there are site ID:s and page ID:s. Page ID:s are unique per site, and randomly generated. Pages can have child pages.
What is best:
1) Use a compound key with site + page-ID:s? Like so:
"sitePageIdPath": "(siteID):(grandparent-page-ID).(parent-page-ID).(page-ID)"
or:
2) Use separate fields for site ID and page IDs? Like so:
"siteId": "(siteID)",
"pageIdPath": "(grandparent-page-ID).(parent-page-ID).(page-ID)"
?
I'm thinking that if I merge site ID and page IDs into one single field, then ElasticSearch will need to handle only that field, and this should be somewhat more performant than using two fields — both when indexing and when searching? And require less storage space.
However perhaps there's some drawback that I'm not aware about? Hence this question.
Some details: 1) I'm using a single index, and I'm over allocating shards (100 shards), as suggested when one uses the "users" data flow pattern. 2) I'm specifying routing parameters explicitly in the URL (i.e. &routing=site-ID
),
not via any siteId field in the documents that are indexed.
Update 7 hours later:
1) All queries should be filtered by site id (that is, tenant id). If I do combine the site ID with the page ID, I suppose/hope that I can use a prefix filter, to filter on site ID. I wonder if this will be as fast as filtering on a single dedicated siteId field (e.g. can the results be cached).
2) Example queries: Full text search. List all users. List all pages. List all child/successor pages of a certain page. Load a single page (via _source).
Update 22 hours later:
3) I am able to search by page ID, because as ElasticSearch's _id
, I store: (site-ID):(page-ID)
. So it's not a probolem that the page ID is otherwise "hidden" as the last element of pageIdPath. I probably should have mentioned earlier that I had a separate page ID field, but I thought let's keep the question short.
4) I use index: not_analyzed
for these ID fields.