How to allow wildcards for custom analyzer in Azure Search

Question

Thanks for your help in advance.

I am using Azure Search .Net SDK to build an indexer. I am currently also using a custom analyzer

Before using the custom analyzer, I was using EnLucene analyzer, which allowed me to use wildcard search *. For example, I was using to allow users to search suffix search. If a user searches for "app", it will return the results such as "apple, application, approach". Please do not suggest autocomplete or suggest because suggester cannot be used together with a custom analyzer. I do not want to create additional 20 search fields just because of suggester. (one for suggester and one for search).

Below is my custom analyzer example. It does not allow me to use * to do partial match. I am not looking for NGram solution for any prefix or suffix partial match. I would actually like to use wildcard *. What could I do to allow wildcard search?

var definition = new Index()
{
    Name = indexName,
    Fields = mapFields,
    Analyzers = new[]
    {
        new CustomAnalyzer
        {
            Name = "custom_analyzer",
            Tokenizer = TokenizerName.Whitespace,
            TokenFilters = new[]
            {
                TokenFilterName.AsciiFolding,
                TokenFilterName.Lowercase,
                TokenFilterName.Phonetic
            }
        }
    }
};

score 0 · Answer 1 · edited Jul 03 '20 at 19:37

Here is how you can do that:

Add you custom analyzer like below:

{
  "name":"names",
  "fields":[
    { "name":"id", "type":"Edm.String", "key":true, "searchable":false },
    { "name":"name", "type":"Edm.String", "analyzer":"my_standard" }
  ],
  "analyzers":[
    {
      "name":"my_standard",
      "@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
      "tokenizer":"standard",
      "tokenFilters":[ "lowercase", "asciifolding" ]
    }
  ]
}

// Below snippet is for creating definition using c#
new CustomAnalyzer
                {
                    Name = "custom_analyzer",
                    Tokenizer = TokenizerName.Standard,
                    TokenFilters = new[]
                    {
                        TokenFilterName.Lowercase,
                        TokenFilterName.AsciiFolding,
                        TokenFilterName.Phonetic
                    }
                }

Then reference the custom analyzer while creating doc definition like below:

    [IsSearchable, IsFilterable, IsSortable, Analyzer("custom_analyzer")]
    public string Property { get; set; }

Check this blog for further reference:

https://azure.microsoft.com/en-in/blog/custom-analyzers-in-azure-search/

Here is sample test method for custom analyzer:

[Fact]
        public void CanSearchWithCustomAnalyzer()
        {
            Run(() =>
            {
                const string CustomAnalyzerName = "my_email_analyzer";
                const string CustomCharFilterName = "my_email_filter";

                Index index = new Index()
                {
                    Name = SearchTestUtilities.GenerateName(),
                    Fields = new[]
                    {
                        new Field("id", DataType.String) { IsKey = true },
                        new Field("message", (AnalyzerName)CustomAnalyzerName) { IsSearchable = true }
                    },
                    Analyzers = new[]
                    {
                        new CustomAnalyzer()
                        {
                            Name = CustomAnalyzerName,
                            Tokenizer = TokenizerName.Standard,
                            CharFilters = new[] { (CharFilterName)CustomCharFilterName }
                        }
                    },
                    CharFilters = new[] { new PatternReplaceCharFilter(CustomCharFilterName, "@", "_") }
                };

                Data.GetSearchServiceClient().Indexes.Create(index);

                SearchIndexClient indexClient = Data.GetSearchIndexClient(index.Name);

                var documents = new[]
                {
                    new Document() { { "id", "1" }, { "message", "My email is someone@somewhere.something." } },
                    new Document() { { "id", "2" }, { "message", "His email is someone@nowhere.nothing." } },
                };

                indexClient.Documents.Index(IndexBatch.Upload(documents));
                SearchTestUtilities.WaitForIndexing();

                DocumentSearchResult<Document> result = indexClient.Documents.Search("someone@somewhere.something");

                Assert.Equal("1", result.Results.Single().Document["id"]);
            });
        }

Feel free to tag me in your conversation, hope it helps.

Hi Mohit. Thanks a lot for your answer. It seems like you are using Standard tokenizer. Is it possible for me to use regex search in any other tokenizers than Standard Lucene analyzer? If so, how should I specify to allow regex search in any other tokenizers than Standard ? — Kyle Ahn, Nov 22 '19 at 18:00
Mohit, it looks like in your unit test that you specify a full token to search on "someone@somewhere.something". If instead you searched on the value "some" would you in fact return both records? In trying to follow your example, I do not get partial matches to return when using a custom analyzer as Kyle mentioned in the original post. You can no longer use asterisk with the custom analyzer, so a partial match is not recognized. In the blog, they solve it with a separate field using the edgeNGram approach. Do you think that is the only solution to this partial match challenge? — Scott Zetrouer, Nov 26 '19 at 13:26

How to allow wildcards for custom analyzer in Azure Search

1 Answers1