1

I have a EDM.string field that I use to store Key Value pairs separated by '||' and commas. For example:

{
    "CustomField": "1234|||student, 5678||blue, 999||metallica, etc..."
}

I need to perform a query to extract a key-value combination. For example:

search=5678 blue&searchFields=CustomField&searchMode=all&queryType=full

Using Regular Expression, I was expecting that the following should work:

search=/5678.*blue/&queryType=full&searchMode=all

I am using the default analyzer, so it seems that it's ignoring the '||'. I've tried using Regular Expressions but with no success. Is it possible to query by the key-value pair, without storing it on a EDM.Collection(string) ? I would like to avoid a new reindex process. Thanks in advance.

Update

Using Collections and a new dataset:

{
    "@odata.context": "https://[service].search.windows.net/indexes('[index]')/$metadata#docs",
    "@odata.count": 3,
    "value": [
        {
            "@search.score": 0.45867884,
            "uniqueid": "5",
            "Name": null,
            "Kvp": [
                "1234||sepultura",
                "999||programmer",
                "876||no education"
            ],
            "Kvp2": "1234||sepultura, 999 programmer, 876||no education"
        },
        {
            "@search.score": 0.38223237,
            "uniqueid": "1",
            "Name": null,
            "Kvp": [
                "1234||metallica",
                "999||horse education",
                "876||high school"
            ],
            "Kvp2": "1234||metallica, 999 horse education, 876||high school"
        },
        {
            "@search.score": 0.38223237,
            "uniqueid": "3",
            "Name": null,
            "Kvp": [
                "1234||john mayer",
                "999||kid education",
                "876||university"
            ],
            "Kvp2": "1234||john mayer, 999 kid education, 876||university"
        }
    ]
}

My search query looks like:

Kvp: education&$count=true&queryType=full&searchMode=all

The problem is that I would like to avoid uniqueid 5 to be retrieved. Although it has "education" as a value for one of the tags, it's not the 999 key.

Also tried:

Kvp: 999||education&$count=true&queryType=full&searchMode=all

Kvp: /.*999.*/ AND /.*education.*/&$count=true&queryType=full&searchMode=all

Kvp: /999.*education/&$count=true&queryType=full&searchMode=all
Thiago Custodio
  • 17,332
  • 6
  • 45
  • 90
  • I don't see a way to solve your search scenario without updating the index and reindexing the content. I think the key is to create the appropriate tokens that allow for searching a unique key/value pair. This could be accomplished using a custom analyzer. Or just use a Collection... – Mr. Kraus Jun 13 '18 at 14:02
  • After days trying, I realized I will need to change the index once again. I'm trying with Collections, but many useful functions are not allowed e.g. match, indexof which makes super hard to solve this problem. – Thiago Custodio Jun 13 '18 at 14:13
  • @Mr.Kraus I've updated the question with another sample data. – Thiago Custodio Jun 13 '18 at 14:32

2 Answers2

1

Use a phrase search by surrounding your query with quotes: Kvp:"999||education"

The analyzer does remove the | character, so this is effectively equivalent to Kvp:"999 education". The thing to understand is how analysis works. What you are indexing here, when you index: "1234|||student, 5678||blue, 999||metallica", is six terms:

  • 1234
  • student
  • 5678
  • blue
  • 999
  • metallica

The and query doesn't work because it looks for the matches anywhere in that list, in the field, thus matching id 5. Order or adjacency are not considered as they would be for a phrase query.

The regex query doesn't work because it must match everything within a single term. Kvp:999.*education won't work because "999" and "education" are analyzed into separate terms, so there are no single terms that match that regex.


Another option, by the way, would be to change the analyzer. If you used a whitespace analyzer, for instance, it would change the indexed terms to:

  • 1234||student,
  • 5678||blue,
  • 999||metallica,

Which could be a solution for you, but would make it impossible to search efficiently for just "metallica".

femtoRgon
  • 32,893
  • 7
  • 60
  • 87
  • Thanks for the explanation, however it's does not return any row using Kvp:"999||education". The second option you gave, do I need to reindex the content or just add another analyzer to my index? I could not find any sample about this. – Thiago Custodio Jun 13 '18 at 16:24
  • how can I change the analyzer when performing a search? Using the whitespace analyzer it returns 4 results for the given string: "1234||sepultura, 999||programmer, 876||no education". As I have a whitespace between 'no' and 'education'. – Thiago Custodio Jun 13 '18 at 16:53
  • 1 - Yes, you need to reindex if you want to change your analyzer. You can change the query analyzer without reindexing, but that won't change how the docs themselves are indexed. 2 - Didn't notice you had spaces in those fields, so no, whitespace won't work as an analyzer. I think keyword would probably work when adding them as a list. Wouldn't for adding them comma delimited. – femtoRgon Jun 13 '18 at 17:35
  • I've created a custom one as following: index.Analyzers = new Analyzer[] { new PatternAnalyzer("customPatternTh", lowerCaseTerms: true, pattern: @"(\w+)\|\|([^\s]+)") }; – Thiago Custodio Jun 13 '18 at 17:37
  • I've set the custom analyzer to be the default for the field and also reindexed the content, but it can't find any document using your example Kvp:"999||education". Any tips? – Thiago Custodio Jun 13 '18 at 17:39
  • Pattern is to match your separator, not the token. – femtoRgon Jun 13 '18 at 17:42
  • Testing the regex seems to break into the right tokens: https://regex101.com/r/am1d6a/1 When you say pattern, you mean the search text? (Sorry for so many questions, I'm new on the search world) – Thiago Custodio Jun 13 '18 at 17:50
  • The pattern you pass into the PatternAnalyzer, it's to match the separator you want to use, not the content of the tokens you want to index. – femtoRgon Jun 13 '18 at 17:54
0

I don't believe Regex would be the most efficient way to do this as we really would not be doing a full text search in your case. If you are looking to retrieve key/value combinations, would it make more sense to put all of the various key/values in a searchable collection? That way you could easily just do a search for "5678||blue". Although if you went this way, you really don't need the pipes (||).

  • I've tried with Collections too, but also don't work. I can have a kvp as "5678||dark blue" and user can search for "blue" and I would like to match both for key 5678. The problem is that if I allow open search, it can find blue on another kvp and this is what I would like to avoid. Using collections and default analyzer, the regex would solve? Is there a way to split for the key 5678, search for blue? – Thiago Custodio Jun 13 '18 at 14:11
  • Is there any way to enable functions match, indexof for Collections? – Thiago Custodio Jun 13 '18 at 14:14