0

I have the default Analyzer set for my index and the fields in Azure Search.

I have following values for a field - name.

  • Demo 001
  • Demo Site 001
  • 001 Demo Site

I am trying to get matching values for following . My sample queries are

$count=true&queryType=full&searchFields=name&searchMode=any&$select=name,id&$skip=0&$top=10&search=name:/"Demo(.*)/

I could get all the results

  1. In order to get the query work for getting only Demo S, that is Demo Site 001. What change I should make to the Query? Or what change I should make to the analyzer?
  2. If I want to get a query working with 001, 001 and a space how can I modify the query?
  3. Finally is there any way I could tell the search that I need only the properties which starts with 001?

Is it possible to achieve all the above three conditions with a single setup?

TBA
  • 1,077
  • 5
  • 41
  • 80
  • Is it fair to say that you need the names to be indexed as is, without word-breaking and then find names that match a prefix? What about case 2; what documents do you expect to be returned? The following article describes the default behavior of the search engine in Azure Search and explains how to customize it: https://learn.microsoft.com/en-us/azure/search/search-lucene-query-architecture – Yahnoosh May 24 '17 at 18:29
  • Yes your first sentence should work. But I tried changing the analyzer of the field to Keyword . Though the field did indexed as string. Non of the prefix query I was able to make it work with. Also with '001 ' I would expect to get last option – TBA May 25 '17 at 10:05

1 Answers1

3

There are 2 probable ways to achieve this.

A. Custom Analyzer with a CharMap filter

1. For index phase, you can use a Custom Analyzer with a character filter to map whitespaces to underscores/emptystring.
   eg:If you map whitespaces to emptystring, your data will be stored as:
    Demo Site 001 ---> DemoSite001
    001 Demo Site ---> 001DemoSite
     "charFilters":[
    {
       "name":"map_dash",
       "@odata.type":"#Microsoft.Azure.Search.MappingCharFilter",
       "mappings":[" =>"]
    }


   In query phase, 
      Step 1. Parse the query and substitute whitespace with the same identifier, as used in the index phase.
          So , search query "Demo S" translates to  ---> "DemoS"
      Step 2. Do a wildcard  search for the new query string
          search = DemoS*

B. Custom Analyzer with an EdgeNGramToken Filter

Use a custom analyzer , with a EdgeNGram TokenFilter to index your documents.
eg:
"tokenFilters": [
{
  "name": "edgeNGramFilter",
  "@odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
  "minGram": 2,
  "maxGram": 20
}
],
"analyzers": [
  {
    "name": "prefixAnalyzer",
    "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
    "tokenizer": "keyword",
    "tokenFilters": [ "lowercase", "edgeNGramFilter" ]
  }
]

With any of these approach

  1. "Demo S" will return only Demo Site 001

  2. "001 " will only return 001 Demo Site

More details :

How Search works

Custom Analyzers

UpasanaDixit
  • 133
  • 7