3

I am migrating an old ES instance to ES7. We need 1-n parent-child relations.

We used to have multiple types in the same index and it was easy. Some types were related to their parent via _parent.

But ES7 will only allow single-type indices. Which makes me think I will convert the old types to separate indices.

I read the docs and they suggest using join for parent-child relations, however those seem to apply only to documents belonging to a single index.

https://www.elastic.co/blog/removal-of-mapping-types-elasticsearch

So if I convert my previous types to separate indices, in my understanding join will not help.

So what is the right solution to model parent-child relation between different types (or should I say indices) in ES7?

Or maybe I should not model my data as separate types/indices in ES7. But in that case, how to solve this?

Thanks in advance

chris
  • 398
  • 2
  • 11

1 Answers1

2

Yes, that's correct in using indices instead of types as ES deprecated that in version 7 hence we have to create multiple indexes to manage this use-case.

So now we have only two options:

Option 1: Denormalize the data and ingest documents accordingly.

Here again you can manage it in two ways:

  • Denormalize significantly in a way you continue to make use of join field or let's say denormalize 1-to-n child types into n indexes of to 1-to-1 parent-child type. Basically you would have as many indexes as many parent-child relations you've had in earlier version, however with parent being same in all the indexes. No of indexes = No of parent-child relationships

  • Second way to achieve this would be to completely denormalize the data in such a way you have a single index with all the information of all children from all types you've had in a single document. In this case no of index = 1

I guess if your children has unique fields, in that case I think the second one with single index may perform, but again you have not mentioned the number of documents you have so you would probably need to find a balance. Another technique is to make use of both as well.

Disadvantages in this case would be

  • Management of ingestion layer or jobs
  • Complexity in maintaining the structure of index
  • Performance issues as per this link in using join type
  • Keep an eye on future ES versions if they decide to modify parent-child feature although this is not to be considered for now.

Advantages:

  • Probably at the service layer which doesn't have to deal with Option 2 as discussed below
  • Able to co-relate with the use-cases you may have from the front-end application usage.

Options 2: Manage Join at application layer

  • Have a single parent index and multiple child indexes but manage the join at the application layer. If you have multiple 1-to-n mapping, then the number of indices would be n (parent = 1, child = n-1)

Disadvantages:

  • May or may not be able to easily co-relate with use-cases
  • Writing separate join logic at the application layer. Not to mention if you would want to do aggregation between parent and child, you'd have to write several for loops with multiple individual aggregation queries.

Advantages:

  • Ease of maintaining jobs or ingestion layer
  • Management of indexes would be less painful

Alternatively you can mix and match both the above options, depending on what use-cases you'd have.

So you see, both have their pluses and minus. If ingestion layer is easy in one, it becomes cumbersome in another, if service layer is easier to maintain in one, it becomes difficult in another.

Best way is to go ahead with some mock data, do some performance testing and see what factors you'd pitch in, ease of querying, maintenance of index, query or aggregation performances, ease of developing/managing both ingestion jobs and service layer etc.

May not be exactly what you are looking for, but I just hope this helps!

Kamal Kunjapur
  • 8,547
  • 2
  • 22
  • 32