1

Here is the sample data:

In the type blog_comments, I have some comments data, whose structure like this:

{"blog_id": 1, "comments": "Apple", "comment_id": 1}

And for #1 and #2 blog, there are 6 comments in this type blog_comments totally:

{"blog_id": 1, "comments": "Apple", "comment_id": 1}
{"blog_id": 1, "comments": "Orange", "comment_id": 2}
{"blog_id": 1, "comments": "Fruit", "comment_id": 3}
{"blog_id": 2, "comments": "Apple", "comment_id": 1}
{"blog_id": 2, "comments": "Orange", "comment_id": 2}
{"blog_id": 2, "comments": "Earth", "comment_id": 3}

Question: Is it possible using some "magic" queries to get#1as the result when I searching "Apple Fruit" and get#2when I search "Apple Earth" ?

I'm considering that joining all comments to one new record (in new type) for each blog then do the search on this new type. But there are too many comments (about 12,000,000 comments), and these comments had already been indexed into the elasticsearch search, so it would be better to use these data as much as possible.

simomo
  • 706
  • 10
  • 24

1 Answers1

0

Ideally, you would need to change the mapping of your index, to be able to search all the comments from one blog post. You can't really search for documents and say that one particular blog id (which is a field in documents) matched over multiple documents at the same time. Elasticsearch knows how to match across multiple fields from the same document, not multiple.

There is one workaround, though. But it depends on what else you need to do with this query, apart from getting back JUST the blog ID.

GET /blog/entries/_search?search_type=count
{
  "query": {
    "match": {
      "comments": "Apple Earth"
    }
  },
  "aggs": {
    "unique": {
      "terms": {
        "field": "blog_id",
        "min_doc_count": 2
      }
    }
  }
}

The query above will return something like this:

"aggregations": {
      "unique": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": 2,
               "doc_count": 2
            }
         ]
      }
   }

The idea of the query is to return just the blog_id ("key":2 under buckets), thus you see there an aggregation of type terms. And depending on how many terms you search (Apple Earth counts for two terms), you set min_doc_count to the number of terms. Meaning, you say that you want to search for apple earth in minimum two documents. The difference between your example and what this actually does is that it will return documents that have, for example apple earth for comments, not just apple in one document and earth in another.

But, as I said, ideally you'd want to change the mapping of your index.

Andrei Stefan
  • 51,654
  • 6
  • 98
  • 89