I need to index a patent catalog that has the following data structure:
"cpc": [
{
"class": "61",
"section": "A",
"sequence": "1",
"subclass": "K",
"subgroup": "06",
"main-group": "45",
"classification-value": "I"
},
{
"class": "61",
"section": "A",
"sequence": "2",
"subclass": "K",
"subgroup": "506",
"main-group": "31",
"classification-value": "I"
}
]
I was wondering what is the right approach here. I might be able to use cpc.class and combine it with multiValued="true".
I would like to find documents that match a certain CPC code. The CPC code can be partial. Right now my solution is simply use a nested reference with multiValued=true. Is there a better way of doing this?
<field name="cpc.class" type="int" indexed="true" stored="true" multiValued="true" />
<field name="cpc.section" type="string" indexed="true" stored="true" multiValued="true" />
<field name="cpc.sequence" type="int" indexed="true" stored="true" multiValued="true" />
<field name="cpc.subclass" type="string" indexed="true" stored="true" multiValued="true" />
<field name="cpc.subgroup" type="int" indexed="true" stored="true" multiValued="true" />
<field name="cpc.main-group" type="int" indexed="true" stored="true" multiValued="true" />
<field name="cpc.classification-value" type="string" indexed="true" stored="true" multiValued="true" />
The problem with this implementation is that it returns documents not actually matching the search criteria. Example:
"cpc.section:A",
"cpc.class:61",
"cpc.subclass:Q",
"cpc.main-group:8"
I get documents not having this combination. I think the current way implements the search so that every field is a list and matching values in any combination are returned. I need to narrow it down so only the right combinations are returned.