1

I am currently working with Custom Analyzers in Azure Search. I have previously had a lot of success with the preview version of the Azure Search API "2015-02-28-Preview" which introduced the feature. I'm currently trying to migrate my custom analyzers to API version "2016-09-01" which according to this article (https://learn.microsoft.com/en-us/azure/search/search-api-migration) includes Custom Anlayzer support. My analyzers are configured as follows:

 "analyzers": [
    {
      "name": "phonetic_area_analyzer",
      "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
      "tokenizer": "area_standard",
      "tokenFilters": [ "lowercase", "asciifolding", "areas_phonetc" ]
    },
    {
      "name": "partial_area_analyzer",
      "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
      "tokenizer": "area_standard",
      "tokenFilters": [ "lowercase", "area_token_edge" ]
    },
    {
      "name": "startsWith_area_analyzer",
      "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
      "tokenizer": "area_keyword",
      "tokenFilters": [ "lowercase", "asciifolding", "area_edge" ]
    }
  ],
  "charFilters": [],
  "tokenizers": [
    {
        "name":"area_standard",
        "@odata.type":"#Microsoft.Azure.Search.StandardTokenizer"
    },
    {
        "name":"area_keyword",
        "@odata.type":"#Microsoft.Azure.Search.KeywordTokenizer"
    }
    ],
  "tokenFilters": [
    {
      "name": "area_edge",
      "@odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenFilter",
      "minGram": 2,
      "maxGram": 50
    },
    {
      "name": "area_token_edge",
      "@odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenFilter",
      "minGram": 2,
      "maxGram": 20
    },
    {
      "name": "areas_phonetc",
      "@odata.type": "#Microsoft.Azure.Search.PhoneticTokenFilter",
      "encoder": "doubleMetaphone"
    }
  ]

This configuration works when using version "2015-02-28-Preview" but when I try version "2016-09-01" I get the following error as a response:

{
  "error": {
    "code": "",
    "message": "The request is invalid. Details: index : The tokenizer of type 'standard' is not supported in the API version '2016-09-01'.\r\n"
  }
}

Is there a problem with my configuration, or does version "2016-09-01" only allow for a limited subset of custom analyzer features? If this is the case, could someone please point me in the direction of some documentation detailing which features are supported?

pantryfight
  • 338
  • 1
  • 3
  • 13

1 Answers1

1

Sorry, there was a delay in the process that updates the documentation. Here is my pull request that has the changes we introduced in 2016-09-01: https://github.com/Azure/azure-docs-rest-apis/pull/218 (request access here https://azure.github.io/)

In your example, change KeywordTokenizer to KeywordTokenizerV2, same for the StandardTokenizer and the EdgeNGramTokenFilter.

Update:

The new version of the documentation is online: https://learn.microsoft.com/en-us/rest/api/searchservice/custom-analyzers-in-azure-search

Yahnoosh
  • 1,932
  • 1
  • 11
  • 13
  • Ah, that's great, thanks a lot. That seems to work nicely. I was worried that the functionality had stayed in preview. By the way, I'm wondering is support for custom analyzers in the .NET SDK yet or do we need to use the REST API? – pantryfight Dec 05 '16 at 04:42
  • Based on this blog post: https://azure.microsoft.com/en-us/blog/announcing-general-availability-of-preview-features-and-new-apis-in-azure-search/?cdn=disable, I believe custom analyzers are supported in the latest version of .Net SDK. – Gaurav Mantri Dec 05 '16 at 05:52
  • Hmm.. I've just been looking at the latest version 3.0.1 of the .NET SDK and the implementation of custom analyzers seems to be incomplete. I am trying to create my Index with SearchServiceClient.Indexes.CreateOrUpdateAsync() which takes a Microsoft.Azure.Search.Models.Index which defines the index configuration. The Index type contains fields for Analyzer, TokenFilter and Tokenizer, which are all types defined in the Microsoft.Azure.Search.Models but each of these types only has a "Name" property, and no properties to define their respective settings. Am I missing something? – pantryfight Dec 05 '16 at 06:13
  • My bad, I see there are also models for each of the Analyzer and Tokenizer types, just that I am using Newtonsoft.Json to parse my index config into the .NET models eg. JsonConvert.DeserializeObject(System.IO.File.ReadAllText("myjsonconfig.json"); and it's not resolving the correct analyzer/tokenizer types. Is there a preferred way of parsing the config as json into the .NET configuration objects without parsing through the json manually? – pantryfight Dec 05 '16 at 06:21
  • Do you mind creating a new SO post for your last question? – Yahnoosh Dec 05 '16 at 14:45
  • OK have done http://stackoverflow.com/questions/40984934/how-to-create-index-with-custom-analyzers-from-json-file-in-azure-search-net-sd – pantryfight Dec 06 '16 at 01:15