Suppose I have a test application representing some friends list. The application uses a collection where all documents are in the following format:
_id : ObjectId("someString"),
name : "George",
description : "some text",
age : 35,
friends : {
[
{
name: "Peter",
age: 30
town: {
name_town: "Paris",
country: "France"
}
},
{
name: "Thomas",
age: 25
town: {
name_town: "Berlin",
country: "Germany"
}
}, ... // more friends
]
}
... // more documents
How can I describe such collection in the schema.xml ? I need to produce facet queries like: "Give me countries, where George's friends live". Another use case may be - "Return all documents(persons), whose friend is 30 years old." etc.
My initial idea is to mark "friends" attribute as text field by this schema.xml definition:
<fieldType name="text_wslc" class="solr.TextField" positionIncrementGap="100">
....
<field name="friends" type="text_wslc" indexed="true" stored="true" />
and try to search for eg. "age" and "30" words in the text, but it is not a very reliable solution.
Please, leave aside not logically well-formed architecture of the collection. It is only an example of similar problem I am just facing.
Any help or idea will be highly appreciated.
EDIT: Sample 'schema.xml'
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="text-schema" version="1.5">
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" positionIncrementGap="0" />
<fieldType name="trInt" class="solr.TrieIntField" precisionStep="0" omitNorms="true" />
<fieldType name="text_p" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</types>
<fields>
<field name="_id" type="string" indexed="true" stored="true" required="true" />
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="_ts" type="long" indexed="true" stored="true"/>
<field name="ns" type="string" indexed="true" stored="true"/>
<field name="description" type="text_p" indexed="true" stored="true" />
<field name="name" type="text_p" indexed="true" stored="true" />
<field name="age" type="trInt" indexed="true" stored="true" />
<field name="friends" type="text_p" indexed="true" stored="true" /> <!-- Here is the problem - when the type is text_p, all fields are considered as a text; optimal solution would be something like "collection" tag to mark name_town and town as descendant of the field 'friends' but unfortunately, this is not how the solr works-->
<field name="town" type="text_p" indexed="true" stored="true"/>
<field name="name_town" type="string" indexed="true" stored="true"/>
<field name="town" type="string" indexed="true" stored="true"/>
</fields>
<uniqueKey>_id</uniqueKey>