0

In Azure Search, on field with values like "12-10-3" or "30-843-44", I have setup a custom tokenizer to replace the dashes with empty string.

I now want to do an "ends with" regex search but cannot get it to do quite what I want.

For example, to find codes ending in 3 I have tried:

searchMode=any&queryType=full&search=code:/(.*)3/

This returns, say, "12-10-3" but also ones like "30-843-44".

I then tried:

searchMode=any&queryType=full&search=code:/(.*)3[^<0-9>]*/

But this seems to give the same result. I have been trying to go through the regex syntax referenced in the Azure Search docs here.

When I test my tokenizer on "123-456-78", it seems to be working, so I don't understand why the regex search is not working correctly.

"tokens": [
        {
            "token": "12345678",
            "startOffset": 0,
            "endOffset": 10,
            "position": 0
        }
]

Any ideas?

Update:

The tokenizer is applied in C# as follows:

var myIndexDefinition = new Index()
{
    Name = "MyIndex",
    Analyzers = new[] 
    {
        new CustomAnalyzer
        {
            Name = "code_with_dash_analyzer",
            Tokenizer = TokenizerName.Keyword,
            CharFilters = new CharFilterName [] { "dash_to_empty_mapper" }
        }
    },
    CharFilters = new List<CharFilter>
    {
        new MappingCharFilter("dash_to_empty_mapper", new[] { "- => " })
    },
    Fields = new[]
    {
     // Field with the dash in the values
     new Field("codes", DataType.String) { IsRetrievable = true, IsSearchable = true, IsSortable = true, IsFilterable = true, IsFacetable = true },
     //.... other field definitions....
    }
}
Richard
  • 116
  • 1
  • 6

1 Answers1

0

According to your description, only per my experience, I guess that your issue might be caused by your custom tokenizer which how to be implement I don't know.

However, without using a custom tokenizer, the lucene regexp you can try that should work is:

/([0-9]+\-?)+[0-9]*3/

Hope it helps.

Peter Pan
  • 23,476
  • 4
  • 25
  • 43
  • Thanks, but I tried with and without the tokenizer but it still returns results in records that _don't_ end in the number I am searching for. – Richard Dec 02 '18 at 21:46