0

Currently in my ES document structure, there is a field of type 'Object'. This is a json object which can have upto 3000 fields inside. The problem being that at times, my ES runs out of memory because of the document size being too large. So I am looking to change my document structure.

The two structures that I am looking at are - Nested mappings and parent child relationship. Both the structures satisfy my requirement for search. Points being considered :

  1. I read that nested queries are much faster than child queries.
  2. Nested mappings too save the nested fields as separate documents.

Two points of confusion that I am facing :

  1. How does nested indexing work? Does ES get the whole document in one go and analyze it completely at once, or the requests for nested documents are individual. Because in the first case, it might so happen that ES runs out of memory again.

  2. When we say parent child queries are slower, how slower do we mean?

Looking for inputs.

Aayushi
  • 73
  • 1
  • 8

3 Answers3

1

Nested are faster than parent/child and are more simple to manage. Infact you can index child without parents, so you have to be careful when you index. Also when you want to delete one entry of parent you have to delete all the children node, is not an automatic task. On the other hand, parent/child are more comfortable if you would to change/update your entry. With nested type you can't change only one nested value in the nested field, you have to reindex all the nested values in the nested field. With parent/child you can change/update also only one value in that parent or child field. Nested are considered as atomical relational data in the index, instead parent/child are only a different datatype that keep the relations from 2 field - parent, child. You can read the kimchy post here, and for the slowness of parent/child you can read the last one comment of the discussion https://discuss.elastic.co/t/choosing-parent-child-vs-nested-document/6742

Lupanoide
  • 3,132
  • 20
  • 36
  • Thanks for your answer. But one of the major question that I have is whether in nested structure, on indexing, does ES analyze the whole document (along with nested fields) at once or separately? – Aayushi Sep 25 '17 at 03:37
0

Nested ::

  1. Nested docs are stored in the same Lucene block as each other, which helps read/query performance. Reading a nested doc is faster than the equivalent parent/child.

  2. Updating a single field in a nested document (parent or nested children) forces ES to reindex the entire nested document.This can be very expensive for large nested docs.

  3. "Cross referencing" nested documents is impossible.

  4. Best suited for data that does not change frequently.

Parent/Child ::

  1. Children are stored separately from the parent, but are routed to the same shard. So parent/children are slightly less performance on read/query than nested.

  2. Parent/child mappings have a bit extra memory overhead, since ES maintains a "join" list in memory.

  3. Updating a child doc does not affect the parent or any other children, which can potentially save a lot of indexing on large docs.

  4. Sorting/scoring can be difficult with Parent/Child since the Has Child/Has Parent operations can be opaque at times

Abhijit Bashetti
  • 8,518
  • 7
  • 35
  • 47
0

The main difference is that nested are faster compared to parent/child, but, nested docs require reindexing the parent with all its children, while parent-child allows to reindex/add / delete specific children.

for example, a product can have only a few tags but can have many comments so keeping tags as nested probably wouldn't be a problem. but (hundreds of) comments to a blog post is a problem.

Rafiq
  • 8,987
  • 4
  • 35
  • 35