2

I'm posting to see if anyone has a solution or can provide some guidance on modelling some data in order to be used in azure search.

The problem domain

I am currently using DocumentDB to model some data which I would like to search. My document, which I shall call "Entity A" at the moment looks something like:

{
 _id,                          //key - Guid
 name,                         //searchable - String
 description,                  //searchable - String
 tags: [ "T1", "T2", ...]      //facet - Collection(String)
 locations: [
   {
      coordinate,              //filter - GeoLocation (lat & long)
      startDateTime,           //filter - DateTimeOffset
      endDateTime              //filter - DateTimeOffset
   },
   ...
  ]
 ...
},
...

Relationships: tags 0...n Entity A & locations 0...n Entity A

Flattening Entity A and setting up a simple index and query for name, description and facet for tags is fine and working great.

The problem lies in trying to add locations to index. Effectively what I want to search (in natural language) is: For a given term, find all the Entity As near a coordinate that overlap with x start date and y end date

From what I can find online - flattening the locations will only work if they become strings.

https://blogs.msdn.microsoft.com/kaevans/2015/03/09/indexing-documentdb-with-azure-seach/ https://learn.microsoft.com/en-us/azure/search/search-howto-index-json-blobs

This seems to lose the power of being able to perform geodistance, and date range queries.

Current Thoughts

Split the Entity A document into two collections

The new Entity A document:

   {
     _id,                          //key - Guid
     name,                         //searchable - String
     description,                  //searchable - String
     tags: [ "T1", "T2", ...]      //facet - Collection(String)
     ...
    },

and multiple location entities

{
  _id,
  documentId,                     //relates to Document._id
  coordinate,
  startDate,
  endDate
}

Questions:

Is it better to have two indices - one for the new Entity A and one for the locations and then join the results?

I think this is the Multitenant Search https://learn.microsoft.com/en-us/azure/search/search-modeling-multitenant-saas-applications

Does anyone know of an examples that implement this?

Pros

  • Think it will work

Cons

  • Would require two search hits for each query and then merging the results (this may or may not be ideal).

OR

Is is better to fully "invert" the Entity A and location entities, ie something like

{
  _id,
  documentDBId,                     //relates to Document._id
  coordinate,
  startDate,
  endDate,
  name,
  description,
  tags: []
  ...
}

Pros

  • Pretty flat already so should be easy to index and query
  • One search hit and no merging

Cons

  • For name, description, tags, etc it would required multiple updates if these changed.
  • Would get multiple results for the same "Entity A" if the date spanned multiple start and end dates

OR

Is there another option?

Thanks and I'm happy to clarify if needed

1 Answers1

0

I'd lean towards your second fully flattened or inverted option

{
  _id,
  documentDBId,                     //relates to Document._id
  coordinate,
  startDate,
  endDate,
  name,
  description,
  tags: []
  ...
}

My main argument for this is paging. If you have two searches and you want to return 10 results on a page, how many results do you ask each search for, and more importantly, where do you start your search for page 2?

There are also issues of ranking your results too, but those are more manageable than paging.

Jeremy Hutchinson
  • 1,975
  • 16
  • 26
  • 2
    I'm not sure that paging would be an issue in practice, as long as the number of location/date combinations that match a given query is relatively small. You'd fetch all the matches, put together a list of the corresponding document IDs, and use that to filter the Entity A index. Paging for the purposes of displaying to the user would happen on the Entity A index. – Bruce Johnston Feb 14 '17 at 23:08
  • Ahh yes, doing it that way would allow handling paging as long as the number of results from the first search is reasonable. – Jeremy Hutchinson Feb 15 '17 at 16:34
  • Thanks @BruceJohnston. I'm going to give it shot and see how far I get/problems I run into. – Chris Wallace Feb 16 '17 at 19:25
  • Thanks @JeremyHutchinson as well for your insights – Chris Wallace Feb 16 '17 at 19:25