21

Is it somehow possible to create a solr document that contains sub-elements?

For example, how would I represent something like this:

<person first="Bob" last="Smith">
   <children>
      <child first="Little" last="Smith" />
      <child first="Junior" last="Smith" />
   </children>
</person>

What is the usual way to solve this problem?

cheffe
  • 9,345
  • 2
  • 46
  • 57
cambo
  • 973
  • 4
  • 11
  • 22

3 Answers3

21

As of Solr 4.7 and 4.8, Solr supports nested documents:

{
"id": "chapter1",
"title" : "Indexing Child Documents in JSON",
"content_type": "chapter",
"_childDocuments_": [
  {
    "id": "1-1",
    "content_type": "page",
    "text": "ho hum... this is page 1 of chapter 1"
  },
  {
    "id": "1-2",
    "content_type": "page",
    "text": "more text... this is page 2 of chapter 1"
  }
]
}

See the Solr release notes for more.

whomer
  • 575
  • 9
  • 21
13

You can model this in different ways, depending on your searching/faceting needs. Usually you'll use multivalued or dynamic fields. In the next examples I'll omit the field type, indexed and stored flags:

<field name="first"/>
<field name="last"/>
<field name="child_first" multiValued="true"/>
<field name="child_last" multiValued="true"/>

It's up to you to correlate the children first names and last names. Or you could just put both in a single field:

<field name="first"/>
<field name="last"/>
<field name="child_first_and_last" multiValued="true"/>

Another one:

<field name="first"/>
<field name="last"/>
<dynamicField name="child_first_*"/>
<dynamicField name="child_last_*"/>

Here you would store fields 'child_first_1', 'child_last_1', 'child_first_2', 'child_last_2', etc. Again it's up to you to correlate values, but at least you have an index. With some code you could make this transparent.

Bottom line: as the Solr wiki says: "Solr provides one table. Storing a set database tables in an index generally requires denormalizing some of the tables. Attempts to avoid denormalizing usually fail." It's up to you to denormalize your data according to your search needs.

UPDATE: Since version 4.5 or so Solr supports nested documents directly: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers

Mauricio Scheffer
  • 98,863
  • 23
  • 192
  • 275
  • Thanks Mauricio. Options #1 and #2 aren't really useful as it then becomes impossible to extract individual fields, especially if there is more than two. Your third suggestion may just work, using dynamic fields. What mechanism in the DataImportHandler would I use to generate these dynamic fields? – cambo Apr 11 '11 at 02:46
  • @user332523: it may be impossible if you are restricted to using the DataImportHandler... but it's very easy to do if you import in your own coded process. – Mauricio Scheffer Apr 11 '11 at 03:00
  • Hi Mauricio, thanks for the reply. Do you just mean a custom data importer that uses the Solr API to add the documents to the index? I did read about something in the Solr DIH docs that might be able to create dynamic fields [http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer] – cambo Apr 19 '11 at 04:15
  • @user332523: yes, in general writing your own data importer is much more flexible than DIH. What DIH provides is simplicity and zero-coding to get started quickly, but I wouldn't hesitate to drop it when it's not enough. – Mauricio Scheffer Apr 19 '11 at 04:26
  • Thanks for the info. Guess a custom DIH is where we'll have to go then....Cheers. – cambo Apr 21 '11 at 22:43
  • Does this apply to Solr 4 as well? – chaostheory Sep 19 '12 at 17:40
  • @Mauricio Scheffer I'm trying to find docs on nested documents on Solr 4.5+ without success. How would schema.xml would be? Do you have any reference on that apart from the link on your answer? – Bolhoso May 13 '14 at 19:34
7

Having a separate fields for children leads to false positive matches. Concatenated fields works in some meaning but it's really limited approach. We have a lot of experience in the similar tasks blogged at http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html

mkhludnev
  • 203
  • 2
  • 5
  • 3
    wow. I answered this almost two years ago. Now I changed my mind. No.1 machinery is described at http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html it's just made available for Solr https://issues.apache.org/jira/browse/SOLR-3076 will be released at 4.5. and btw it's supported by #ElasticSearch for really long time – mkhludnev Aug 30 '13 at 21:01