0

We have a set of blobs, all sorts of content.

We need to index the metadata and the content, but we are happy to just skip the content for unsupported file types and very large files. For example we have

File One.docx - supported type - Indexes metadata and content (good)

File Two.dat - unsupported type - Indexes metadata skip content (good)

File Three.txt - supported type, fails due to the size of the blob. (bad)

Our search is config is based on the docs , we just added failOnUnsupportedContentType to the Configuration and set it to false

We would like to index the metadata for File Three.txt but skip the large content, something like failOnOversizedContent which we would set to false.

Right now we get an error relating the size of the blob being too large.

Eugene Shvets
  • 4,561
  • 13
  • 19
Steve Drake
  • 1,968
  • 2
  • 19
  • 41

1 Answers1

2

UPDATE Jan 3, 2018

I realized that my original suggestion to use AzureSearch_SkipContent blob metadata does not resolve the issue since blob still needs to be downloaded to process content type metadata.

To make this scenario work gracefully, we are adding indexStorageMetadataOnlyForOversizedDocuments indexer configuration setting. It takes a bool value and is false by default, so set it to true in the indexer configuration to enable it. This is fresh off the presses and will be deployed in production worldwide by January 19.

Original response

You can add AzureSearch_SkipContent: true metadata to the large blobs, as described in Controlling which parts of the blob are indexed. I realize it may be inconvenient, but that's something that can unblock you.

We would like to index the metadata for File Three.txt but skip the large content, something like failOnOversizedContent which we would set to false.

This looks like a useful feature request - please add a suggestion at our UserVoice site and we'll consider this, especially if we see other customers asking for this.

Eugene Shvets
  • 4,561
  • 13
  • 19
  • Also, if you let me know your service name, we can explore some other options. You can reach me at eugenesh at the usual Microsoft domain. – Eugene Shvets Dec 22 '17 at 01:17
  • would it make sense to have two indexes one for metadata and one for content? both having the same key. – Steve Drake Dec 22 '17 at 10:28
  • I need to patch up the blobs anyways, so added `AzureSearch_SkipContent` is no big deal. Thanks – Steve Drake Dec 22 '17 at 13:47
  • Probably doesn't make sense to have separate indices unless you *want* to search the content and metadata separately. Separate indices mean you'll need to issue multiple search requests (and merge the results), indices may not be in sync, and it will be less efficient on Azure Search side. – Eugene Shvets Dec 22 '17 at 15:47
  • sorry, I meant two Indexers :) but, I have added the SkipContent flag.. Thanks – Steve Drake Dec 22 '17 at 19:43