0

I store URL as a field in Elasticsearch. However, I would like to filter only documents that has subdomain in the url.

For example.

I want my search result to have

http://any-subdomain.example.com

But I don't want the result to have

https://www.example.com

Is this possible in Elasticsearch query?

toy
  • 11,711
  • 24
  • 93
  • 176
  • this answer might help: http://stackoverflow.com/questions/34887458/elasticsearch-query-string-with-wildcards/34986008#34986008 – Val Feb 05 '17 at 05:31

1 Answers1

1

Have you tried with query_string query? For example, I used for twitter data like below:

GET /twitter2/tweet/_search
{
    "query": {
        "query_string": {
           "default_field": "entities.media.url",
           "query": "https\\:\\/\\/t.co\\/* AND -https\\:\\/\\/t.co\\/6*"
        }
    },
    "_source": ["entities.media.url"]
}

For this search my mapping :

PUT /twitter2/tweet/_mapping
{
    "properties": {
        "entities": {
            "properties": {
                "media": {
                    "properties": {
                        "url": {
                            "type": "string",
                            "index": "not_analyzed"
                        }
                    }
                }
            }
        }
    }
}

And you can use following query for your case:

GET /your-index/your-type/_search
{
    "query": {
        "query_string": {
           "default_field": "url",
           "query": "http\\:\\/\\/*.example.com AND -http\\:\\/\\/www.example.com"
        }
    }
}

Note : you should know that you can get your result faster if you use something to handle while indexing your data as url and host. With elastic 5.x, you can use ingest node to manipulate your data like this. I will try to create a pipeline for this but you can check the doc for more information

hkulekci
  • 1,894
  • 15
  • 27