3

I can't find a way to do the following with ElasticSearch:

  • I have 2,000,000 items indexed in ElasticSearch
  • I have 30,000 players saved in MySQL

Every item has the name of a player as an attribute. The online status of these players changes every 15 minutes, and can be true or false (obviously).

I would like to be able to show only items for online players.

I don't think I can index the online status with the item, since it changes so often. I can't really get all the ids of the online players and use that as a filter since there are so many.

Would it help to index players in ElasticSearch as well? Is it possible to do some kind of JOIN with another index?

edit: After looking more into how doing joins with ES, I found out that it's actually possible with has_child if I index players in ES. Tire does not have a method for has_child, but is possible to do it with the existing DSL?

Robin
  • 21,667
  • 10
  • 62
  • 85

1 Answers1

3

Seems a good fit for a parent child relation between players and items, even if you don't need full text search on the parent documents, because:

  1. each item belongs to a player
  2. they have independent update lifecycles: when a player changes, you don't want to reindex all his items
  3. you only want to return the children, applying a filter to their parents.

You could index your players too, in the same index as the items but within a separate type. You need to declare in your mapping that the player type is parent of the item type:

{
  "item":{
    "_parent":{
      "type" : "player"
    }
  }
}

After that you index the players, then your items specifying the parent player id for each of them.

You can then execute a full text search on the items, filtering them using the following has_parent filter.

{
    "has_parent" : {
        "parent_type" : "player",
        "query" : {
            "term" : {
                "status" : true
            }
        }
    }
}

This way you would only query and eventually return the items that belong to an active player.

In order to update players you can use the update API and maybe use scripting to avoid resending the whole document. Beware that the document is going to be deleted and reindexed anyway under the hood, that's how lucene works.

If you want to see more examples about relations between documents in elasticsearch, have a look at the following articles:

Depending on the type of queries that you are going to need you might encounter limitations, but given what you've written this is what I would do. Just make sure your nodes have enough memory, since elasticsearch keeps in memory a join table containing all the ids involved when using parent-child.

javanna
  • 59,145
  • 14
  • 144
  • 125
  • Awesome, thanks. I was actually starting to read about has_child/has_parent. I will index players in a different index because I already have an index per type of items. Now I need to find out how to do this with Tire. Thanks a lot for this great answer. – Robin Apr 20 '13 at 16:52
  • Unfortunately parent child is not supported across different indexes. They must be different types within the same index. The children will be indexed in the same shard as their parent document, since that's the only way it can actually work. – javanna Apr 20 '13 at 17:06
  • `By limiting itself to parent/child type relationships elasticsearch makes life easier for itself: a child is always indexed in the same shard as its parent, so has_child doesn’t have to do awkward cross shard operations.` Indeed. That's sad... I guess my only option would be to reindex everything in one index :( – Robin Apr 20 '13 at 17:16
  • Yep, unfortunately you already got to its limitations, I didn't think that was a big one for you but sure, it's annoying. – javanna Apr 20 '13 at 17:21