0

According to link , the parent child docs are stored as

child1,child2,parent

Does this mean if I have references to child1 from parent1 and parent2, it will be duplicated twice or will it refer to the same child1 document?

Will it be 
child1(COPY1),child2,parent1    ///  child1(COPY2),parent2

OR 
child1(COPY1),parent1 ///// child1(COPY1),parent2
Sumeet Sharma
  • 2,573
  • 1
  • 12
  • 24

1 Answers1

1

After experimenting a bit with nested documents I came to a conclusion that, when it comes to updating index solr treats parent and all its children as one atomic document. This document is identified by parent's id. There is no such thing as a reference to a child. Child is a part of one atomic document.

This means that when you index two nested documents:

curl http://localhost:8983/solr/demo/update?commitWithin=3000 -d '                                                                                                                                            
[{
  id : parent1,                                                                                                                                                             
  _childDocuments_ : [                                                                                                                                                                                        
    { id: child1},                                                                                                                                                                                                         
    { id: child2}                                                                                                                                                                                                         
  ]                                                                                                                                                                                                           
}, {
  id : parent2,                                                                                                                                                             
  _childDocuments_ : [                                                                                                                                                                                        
    { id: child1}                                                                                                                                                                                                        
  ]                                                                                                                                                                                                           
}]'

you will end up with the following index

child1,child2,parent1,child1,parent2

updating index of parent2-child1 will not affect parent1-child1.

Molecular Man
  • 22,277
  • 3
  • 72
  • 89
  • oh this means child1 will be created twice in the above case.. wouldnt it cause a lot of duplication ? – Sumeet Sharma Jan 04 '16 at 10:14
  • @SumeetSharma, yes it will be created twice. Regarding duplication - well, depends on your data. If you store all the fields then you'll probably have some duplication. If you only index data an effect of duplication shouldn't be significant (anyways, I can't even imagine how one would avoid duplication while building reverted index). – Molecular Man Jan 04 '16 at 13:09
  • one last question, wouldnt such structure effect the relevancy of the document? as in if the child doc is something which appears in many parents, it would be ok to have its relevancy lower in case the context is parent search but what if i search among the child docs only.. in that case wouldnt it's relevancy be wrongly reduced? – Sumeet Sharma Jan 05 '16 at 04:53
  • @SumeetSharma, sorry. Don't have an answer to this – Molecular Man Jan 05 '16 at 17:24