0

I am so tied for split the data for my expectation output. But i could not able to got it. I tried all the Filter and Tokenizer. I Have Updated setting in elastic search as give below.

    {
      "settings": {
        "analysis": {
          "filter": {
            "filter_word_delimiter": {
                                "preserve_original": "true",
                                "type": "word_delimiter"
                    }
          },
          "analyzer": {
            "en_us": {
              "tokenizer":  "keyword",
              "filter":   [ "filter_word_delimiter","lowercase" ]
            }

          }
        }
      }
    }

Executed Queries curl -XGET "XX.XX.XX.XX:9200/keyword/_analyze?pretty=1&analyzer=en_us" -d 'DataGridControl'

Hits value

{
  "tokens" : [ {
    "token" : "datagridcontrol"
    "start_offset" : 0,
    "end_offset" : 16,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "data",
    "start_offset" : 0,
    "end_offset" : 4,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "grid",
    "start_offset" : 4,
    "end_offset" : 8,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "control",
    "start_offset" : 9,
    "end_offset" : 16,
    "type" : "word",
    "position" : 3
  } ]
}

Expectation Result like -> DataGridControl DataGrid DataControl Data grid control What type of tokenizer and Filter add to index setting. Any help ?

Andrei Stefan
  • 51,654
  • 6
  • 98
  • 89
BasK
  • 284
  • 8
  • 24
  • Your expectation result doesn't quite have a rule. Why "DataGrid" and "DataControl" and not "GridControl"? Can you explain a bit better what you are trying to achieve? – Andrei Stefan Jan 06 '15 at 11:30
  • If i search gridcontrol in my index.. it does not fired DataGridControl document.. In case my request is data grid control means it retried document. – BasK Jan 06 '15 at 11:51

1 Answers1

1

Try this:

{
  "settings": {
    "analysis": {
      "filter": {
        "filter_word_delimiter": {
          "type": "word_delimiter"
        },
        "custom_shingle": {
          "type": "shingle",
          "token_separator":"",
          "max_shingle_size":3
        }
      },
      "analyzer": {
        "en_us": {
          "tokenizer": "keyword",
          "filter": [
            "filter_word_delimiter",
            "custom_shingle",
            "lowercase"
          ]
        }
      }
    }
  }
}

and let me know if it gets you any closer.

Andrei Stefan
  • 51,654
  • 6
  • 98
  • 89
  • Its not working when input is lowercase(datagridcontrol).. Plz help me. – BasK Jan 07 '15 at 09:01
  • That's the whole point of `word_delimiter`. It splits based on some clues from the text. One of those clues is the transition from lower case to upper case. If you input something all lowercase, how would you expect *any* algorithm out there to split that text? – Andrei Stefan Jan 07 '15 at 09:15
  • Any other option there in elastic search to achieve this way (Upper case and lower case). – BasK Jan 07 '15 at 10:21