0

I have the following data :

From SELECT c.addresses[0] address, [ c.name ] filenames FROM c

[
  {
    "address": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    "filenames": [
      "File 01.docx"
    ]
  },
  {
    "address": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    "filenames": [
      "File 02.docx"
    ]
  },
  {
    "address": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    "filenames": [
      "File 03.docx"
    ]
  }, ....

The address field is the key, I have an index with a field defined as follows :

new Field()
{
    Name = "filenames",
    Type = DataType.Collection(DataType.String),
    IsSearchable = true,
    IsFilterable = true,
    IsSortable = false,
    IsFacetable = false
},

As you can see, I create an array for the filenames with [ c.name ] filenames.

When I index the data displayed above, the index contains one row in the filenames collection, that row is the last one that has been indexed. Can I make it add to the collection (merge) rather than replace?

I am also looking at solving this with the Query, but CosmosDB does not support a subselect (yet) and a UDF can only see the data that's passed into it.

Steve Drake
  • 1,968
  • 2
  • 19
  • 41

1 Answers1

1

Fundamentally, the way you have structured your Cosmos DB collection makes this scenario unworkable because Azure search does not support merging into a collection.

Consider changing your design to so that address is a key (that is, unique) in the collection, and all filenames are gathered in a single document per address:

  {
    "address": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    "filenames": [ "File 01.docx", "File 02.docx", "File 03.docx", ... ]
  }

Also, please add a suggestion on Azure Search UserVoice site to add support for merging collections, which would make your scenario easier to achieve.

Eugene Shvets
  • 4,561
  • 13
  • 19
Arvind - MSFT
  • 561
  • 2
  • 6
  • I thought as much, our CosmosDB represents a file system and we reduce the storage cost by using a single instance storage approach. We could not really change the structure to allow files to be stored as an array BUT we could create another collection of documents to facility the indexing requirements. We could create a trigger to write out the document with all the files. Or, drop the single instance storage as the extra effort required to crawl may not be worth it. – Steve Drake Jan 04 '18 at 19:13