I'm posting to see if anyone has a solution or can provide some guidance on modelling some data in order to be used in azure search.
The problem domain
I am currently using DocumentDB to model some data which I would like to search. My document, which I shall call "Entity A" at the moment looks something like:
{
_id, //key - Guid
name, //searchable - String
description, //searchable - String
tags: [ "T1", "T2", ...] //facet - Collection(String)
locations: [
{
coordinate, //filter - GeoLocation (lat & long)
startDateTime, //filter - DateTimeOffset
endDateTime //filter - DateTimeOffset
},
...
]
...
},
...
Relationships: tags 0...n Entity A & locations 0...n Entity A
Flattening Entity A and setting up a simple index and query for name, description and facet for tags is fine and working great.
The problem lies in trying to add locations to index. Effectively what I want to search (in natural language) is: For a given term, find all the Entity As near a coordinate that overlap with x start date and y end date
From what I can find online - flattening the locations will only work if they become strings.
https://blogs.msdn.microsoft.com/kaevans/2015/03/09/indexing-documentdb-with-azure-seach/ https://learn.microsoft.com/en-us/azure/search/search-howto-index-json-blobs
This seems to lose the power of being able to perform geodistance, and date range queries.
Current Thoughts
Split the Entity A document into two collections
The new Entity A document:
{
_id, //key - Guid
name, //searchable - String
description, //searchable - String
tags: [ "T1", "T2", ...] //facet - Collection(String)
...
},
and multiple location entities
{
_id,
documentId, //relates to Document._id
coordinate,
startDate,
endDate
}
Questions:
Is it better to have two indices - one for the new Entity A and one for the locations and then join the results?
I think this is the Multitenant Search https://learn.microsoft.com/en-us/azure/search/search-modeling-multitenant-saas-applications
Does anyone know of an examples that implement this?
Pros
- Think it will work
Cons
- Would require two search hits for each query and then merging the results (this may or may not be ideal).
OR
Is is better to fully "invert" the Entity A and location entities, ie something like
{
_id,
documentDBId, //relates to Document._id
coordinate,
startDate,
endDate,
name,
description,
tags: []
...
}
Pros
- Pretty flat already so should be easy to index and query
- One search hit and no merging
Cons
- For name, description, tags, etc it would required multiple updates if these changed.
- Would get multiple results for the same "Entity A" if the date spanned multiple start and end dates
OR
Is there another option?
Thanks and I'm happy to clarify if needed