0

SOlr/Carrot2 Integration

i have multiple text files for each i created XML to index document on Solr as bellow

<add>
  <doc>
    <person>data </person>
    <organization>data here </organization>
    <content>Some spanish text here</content >
  </doc>
<add>

Schema used in Indexing

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />    
<field name="person" type="string"  indexed="true" stored="true" required="true" multiValued="true" />
<field name="orgnization" type="string" indexed="true" stored="true" required="true" multiValued="true"   />
<field name="content" type="text_es" indexed="true" stored="true" multiValued="true"/>  
<field name="location" type="string"  indexed="true" stored="true" required="true" multiValued="true" />

Now i am trying to integrate carrot2 clustering ,for that i followed this link http://carrot2.github.io/solr-integration-strategies/carrot2-3.8.0/index.html

My Problem is as a result of cluster query i am getting only one cluster as bellow

<arr name="clusters">
  <lst>
<arr name="labels">
  <str>Other Topics</str>
    </arr>
    <double name="score">0.0</double>
    <bool name="other-topics">true</bool>
    <arr name="docs">
      <str>#.txt</str>
      <str>abci-britanicos-pizzerias-201312120250.txt</str>
      <str>abci-arqueologos-israelis-descubren-primer-201312111303.txt</str>
      <str>abci-autoridad-fiscal-pensiones-201312111956.txt</str>
      <str>abci-buenas-razones-para-cambiar-201312110933.txt</str>
      <str>abci-audio-asamblea-aserpinto-201312112139.txt</str>
      <
    </arr>
  </lst>
  </arr>

i should get more cluster My corpus contain 60 text documents

GaneshP
  • 746
  • 7
  • 25

2 Answers2

1

In order for search results clustering to work in Solr, the title and content fields you pass for clustering must be stored. The declaration in Solr schema could look like this:

<field name="content" type="text" indexed="true" stored="true" />
Stanislaw Osinski
  • 1,231
  • 1
  • 7
  • 9
  • In cluster workbench clustering is working so i guess my solr schema dont have any problem i guess solr configuration for clustering might have problem – GaneshP Dec 16 '13 at 04:03
  • When using Workbench, you'd still need to have your content in stored fields. The reason for this is that the clustering algorithm needs the original text, so that it can analyze sequences of words. Such information is not available if your fields are indexed but not stored. – Stanislaw Osinski Dec 16 '13 at 15:42
  • my content field is indexed and stored check schema given in question – GaneshP Dec 17 '13 at 04:56
  • Would you be able to [save your results to XML in Workbench](http://doc.carrot2.org/#section.getting-started.saving-clusters) and e-mail me for debugging? – Stanislaw Osinski Dec 17 '13 at 12:56
  • stanislaw.osinski your email ?? – GaneshP Dec 24 '13 at 06:30
  • solved thanks all for help it was silly mistek inquery i was using q="*:*" i replaced it with q="*.*" showed result of cluster – GaneshP Dec 24 '13 at 15:20
1

In addition to what Stanislaw said about fields being stored, please provide the query you used for clustering and, ideally, the full schema used to index your data.

If you have a mere 60 documents in your index and the query matches a small subset of documents then there will be nothing to cluster on.

dawid.weiss
  • 168
  • 6