0

The settings for one of my indexes is as follows, however the stemmer isn't being applied. For example a search for fox will not pick up articles that include the term foxes. I can't see why as the order of the filters is correct (lowercase precedes the stemmer).

{
"articles": {
    "settings": {
        "index": {
            "creation_date": "1436255268907",
            "analysis": {
                "filter": {
                    "filter_stemmer": {
                        "type": "stemmer",
                        "language": "english"
                    },
                    "kill_filters": {
                        "pattern": ".*_.*",
                        "type": "pattern_replace",
                        "replacement": ""
                    },
                    "filter_stop": {
                        "type": "stop"
                    },
                    "filter_shingle": {
                        "min_shingle_size": "2",
                        "max_shingle_size": "5",
                        "type": "shingle",
                        "output_unigrams": "true"
                    },
                    "filter_stemmerposs": {
                        "type": "stemmer",
                        "language": "possessive_english"
                    }
                },
                "analyzer": {
                    "tags_analyzer": {
                        "type": "custom",
                        "filter": [
                            "standard",
                            "lowercase",
                            "filter_stemmerposs",
                            "filter_stemmer"
                        ],
                        "tokenizer": "patterntoke"
                    },
                    "shingles_analyzer": {
                        "filter": [
                            "standard",
                            "lowercase",
                            "filter_stop",
                            "filter_shingle",
                            "kill_filters",
                            "filter_stemmerposs",
                            "filter_stemmer"
                        ],
                        "char_filter": [
                            "html_strip"
                        ],
                        "type": "custom",
                        "tokenizer": "standard"
                    }
                },
                "tokenizer": {
                    "patterntoke": {
                        "type": "pattern",
                        "pattern": ","
                    }
                }
            },
            "number_of_shards": "5",
            "number_of_replicas": "1",
            "version": {
                "created": "1060099"
            },
            "uuid": "H2NsE3eKT1y_ArPOPbjT6w"
        }
    }
}
}

And below is the mapping:

 {
"articles": {
    "mappings": {
        "article": {
            "properties": {
                "accountid": {
                    "type": "double",
                    "include_in_all": false
                },
                "article": {
                    "type": "string",
                    "index_analyzer": "shingles_analyzer"
                },
                "articleid": {
                    "type": "double",
                    "include_in_all": false
                },
                "categoryid": {
                    "type": "double",
                    "include_in_all": false
                },
                "draftflag": {
                    "type": "double",
                    "include_in_all": false
                },
                "files": {
                    "type": "string",
                    "index_analyzer": "tags_analyzer"
                },
                "tags": {
                    "type": "string",
                    "index_analyzer": "tags_analyzer"
                },
                "title": {
                    "type": "string",
                    "index_analyzer": "shingles_analyzer"
                },
                "topicid": {
                    "type": "double",
                    "include_in_all": false
                }
            }
        }
    }
}
}

The sample documents are varied but for example 1 contains the token fox and another foxes (both derived from the article field) but each document is only found when the search is fox or foxes and not either which is what I'd expect. The search used Is a fuzzylikethis search (I'm using Nest .net to execute the query)

SSED
  • 475
  • 3
  • 9
  • 22
  • You haven't provided the complete mapping and, also, the query and sample document that fails! – Andrei Stefan Jul 07 '15 at 08:19
  • sorry, have included now – SSED Jul 07 '15 at 08:40
  • 1
    I don't know what you are trying to do with that analyzer, but it has a strange output. For example, for a text like `foxes jump over the fence`, your analyzer generates these terms: `"", "fenc", "fox", "foxes jump", "foxes jump ov", "jump", "jump ov", "over"` – Andrei Stefan Jul 07 '15 at 09:18
  • 1
    Given those terms it looks ok, now it depends on what kind of query you are using to match `fox`. I have asked you to provide the query and you didn't :-). For example, this query would match: `"query": { "term": { "article": { "value": "fox" } } }` – Andrei Stefan Jul 07 '15 at 09:19
  • thanks for this - I am using a fuzzylikethis query (using nest .net). I think maybe my analyser needs looking at, maybe drop the shingles and do an ngram. – SSED Jul 07 '15 at 09:50

0 Answers0