1

I'm trying this on a local 1.7.5 elasticsearch installation

http://localhost:9200/_analyze?filter=shingle&tokenizer=keyword&text=alkis stack

I see this

{
   "tokens":[
      {
         "token":"alkis stack",
         "start_offset":0,
         "end_offset":11,
         "type":"word",
         "position":1
      }
   ]
}

And I expected to see something like this

{
   "tokens":[
      {
         "token":"alkis stack",
         "start_offset":0,
         "end_offset":11,
         "type":"word",
         "position":1
      },
      {
         "token":"stack alkis",
         "start_offset":0,
         "end_offset":11,
         "type":"word",
         "position":1
      }
   ]
}

Am I missing something?

Update

{
  "number_of_shards": 2,
  "number_of_replicas": 0,
  "analysis": {
    "char_filter": {
      "map_special_chars": {
        "type": "mapping",
        "mappings": [
          "- => \\u0020",
          ". => \\u0020",
          "? => \\u0020",
          ", => \\u0020",
          "` => \\u0020",
          "' => \\u0020",
          "\" => \\u0020"
        ]
      }
    },
    "filter": {
      "permutate_fullname": {
        "type": "shingle",
        "max_shingle_size": 4,
        "min_shingle_size": 2,
        "output_unigrams": true,
        "token_separator": " ",
        "filler_token": "_"
      }
    },
    "analyzer": {
      "fullname_analyzer_search": {
        "char_filter": [
          "map_special_chars"
        ],
        "filter": [
          "asciifolding",
          "lowercase",
          "trim"
        ],
        "type": "custom",
        "tokenizer": "keyword"
      },
      "fullname_analyzer_index": {
        "char_filter": [
          "map_special_chars"
        ],
        "filter": [
          "asciifolding",
          "lowercase",
          "trim",
          "permutate_fullname"
        ],
        "type": "custom",
        "tokenizer": "keyword"
      }
    }
  }
}

And I'm trying to test like this

http://localhost:9200/INDEX_NAME/_analyze?analyzer=fullname_analyzer_index&text=alkis stack
Alkis Kalogeris
  • 17,044
  • 15
  • 59
  • 113
  • No, it is working as expected. And using `keyword` will not split the two terms btw. What do you want to accomplish, though? – Andrei Stefan Jul 20 '16 at 14:46
  • I posted the entire analyzer. I want to normalize a field (fullname) and make permutations in order to make a fuzzy search in the end. Basically I want to take `Andrei Stefan` and normalize it. Then make permutations `andrei stefan` and `stefan andrei`. – Alkis Kalogeris Jul 20 '16 at 15:22
  • The reason I'm using keyword instead of standard or whitespace, is because I want to keep the entire fullname as one token. But when the permutations are created, I need it to use whitespace which is indeed what's configured. – Alkis Kalogeris Jul 20 '16 at 15:36
  • Off the top of my head I don't think that permutations are possible with terms out-of-the-box. Shingles are a variation on this but they don't switch places of terms. I'm still wondering the real use-case for this. You mentioned _fuzzy_ search, but what you want to actually search (terms only, match, query_string etc)? – Andrei Stefan Jul 20 '16 at 16:22
  • For example, can't you use `multi_match` or even `term`? – Andrei Stefan Jul 20 '16 at 16:29
  • First of all I'm new to elasticsearch, if this doesn't make sense to you, then maybe what I'm thinking is not possible or not optimal. I was thinking of creating these permutations so the searching with `andrei stefan` and `stefan andrei` will produce the same result. What I've understood reading about term and match is that they are similar, but differ on the fact the term is not analyzed. So either I do the analysis in elasticsearch and use match or do the normalisation of the query on the client and use term. I was thinking of using query string since I don't want to have multiple terms. – Alkis Kalogeris Jul 20 '16 at 16:42
  • If I use the standard tokenizer and I have indexed `Andrei Stefan`. Then by using a match query and searching for `Andrei Foo`, then I will get a match. Is that correct (if yes, then that's what I'm trying to avoid, if no then I've misunderstood something)? What I'm trying to do is to build a simple name search functionality. – Alkis Kalogeris Jul 20 '16 at 16:44
  • Simple name search should be difficult and I doubt you need that permutations thing. What are the requirements of this search? (exact match, lowercase/uppercase matters?, fuzzy?....) Start by creating a list of the things you'd like the search to do and what to match given certain input data. – Andrei Stefan Jul 20 '16 at 16:56
  • It will be fuzzy, lowercase everything since case shouldn't matter. I need to use asciifolding in order to remove accents, they shouldn't matter either. I believe I've got all this covered with the analyzer as is. Any characters like dash, comma etc. (e.g. Jr. ) shouldn't matter either. The only thing I haven't done yet is the order of name and surname. That shouldn't matter either. Since this can't be done in es, with shingles, then I can just use a query with an or operator and permutate on query param on the client side. – Alkis Kalogeris Jul 20 '16 at 17:11
  • Do you have separate fields for first name and last name? – Andrei Stefan Jul 20 '16 at 17:12
  • Nope... That would make my life easier. – Alkis Kalogeris Jul 20 '16 at 17:12
  • And you don't have control over the mapping this way? I'm assuming you just have a `name` field? And what does the user search? A full name or separate strings? – Andrei Stefan Jul 20 '16 at 17:14
  • Basically I have the fields in the db. Fullname, Surname, Firstname. The problem is that the query param is a fullname. It is not separated. So I need to check against the fullname (and the lastname with the same analyzer, but not the firstname). Checking the lastname will ensure that I will get results where only the last name is provided as a query param. And checking the fullname instead of the first name will ensure that I will not get Alkis Foo and Alkis Bar, when I searched with alkis foo. I can't try to separate the query param because that's a difficult road on its own – Alkis Kalogeris Jul 20 '16 at 17:17
  • Just for clarity, I don't want or better say I don't care, to get results when the query is only the firstname. – Alkis Kalogeris Jul 20 '16 at 17:24
  • Hm... index first name and last name in two separate fields in ES, just as you have them in the DB. The text received as query can be analyzed (`match` does it for example, `query_string` does it). And there are ways to search both fields at the same time with all the **terms** in the search string. I think you are over-complicating the use case with single name in one go. – Andrei Stefan Jul 20 '16 at 17:29

1 Answers1

1

Index first name and last name in two separate fields in ES, just as you have them in the DB. The text received as query can be analyzed (match does it for example, query_string does it). And there are ways to search both fields at the same time with all the terms in the search string. I think you are over-complicating the use case with single name in one go and creating names permutations at indexing time.

Andrei Stefan
  • 51,654
  • 6
  • 98
  • 89