1

I know its not efficient to use “JOIN” in Elasticsearch, but I need to use it. I have to extract values by find same field of index A and index B. there is an example below.

A/type1/1
{
“serial”:“abc”,
“member”:“jack”
}

A/type1/2
{
“serial”:“def”,
“member”:“jack”
}

B/type2/1
{
“serial”:“abc”,
“temp”:1
}

B/type3/2
{
“serial”:“abc”,
“water”:0
}

B/type2/3
{
“serial”:“def”,
“temp”:10
}

I need to filter the value of the ‘member’ field of the A index to find the corresponding serial, and then I want to get the values ​​of the temp and water fields in the B index. ex) filter: {“member”:“jack”} ===> temp:1, water:0, temp:10

I wonder if I can get this result, and if so, how do i establish the data structure (index structure).

Jan_V
  • 4,244
  • 1
  • 40
  • 64
ykoo
  • 209
  • 2
  • 9
  • 2
    Would it be possible for you to [denormalize your data](https://stackoverflow.com/questions/36915428/how-to-setup-elasticsearch-index-structure-with-multiple-entity-bindings/36982705#36982705)? – Val Sep 01 '17 at 06:50

1 Answers1

1

You should definitely do what the commenter Val suggests, denormalize (flatten) your data, if it is at all possible. I would suggest, for example, you could use documents like this (basically, do the join before indexing):

B/type2/1 {"serial": "abc", "temp": 1, "member": "jack"}
B/type2/2 {"serial": "abc", "water": 0, "member": "jack"}
B/type2/3 {"serial": "def", "temp": 10, "member": "jack"}

Then if you search {"match": {"member": "jack"}}, you'll get all those documents. There are two ways of doing something like "joins" in Elasticsearch, parent-child relationships and nested objects. Here's the example of how you could create your mapping with nested objects:

{
  "type1": {
    "properties": {
      "serial": {"type": "keyword"},
      "member": {"type": "keyword"},
      "type2s": {
        "type": "nested",
        "properties": {
          "temp": {"type": "integer"},
          "water": {"type": "integer"}
        }
      }
    }
  }
}

Then you would store a record like this:

{
  "serial": "abc",
  "member": "jack",
  "type2s": [
    {
      "temp": 1
    },
    {
      "water": 0
    }
  }
}

However, I would strongly urge you not to do this unless you absolutely have to! Use cases where this is a good idea are rare. It makes querying your data more complex, and it's inefficient (so as your data scales, you are going to have issues much sooner).

I know it feels wrong to "duplicate" data. It would be a terrible practice in a relational database. You really have to develop a different way of thinking, for effective and efficient data modeling in Elasticsearch, and one of the differences is that you shouldn't worry too much about duplicating data.

dshockley
  • 1,494
  • 10
  • 13