6

I am using queries (Solr Admin) to search words through two text documents that are in my HDFS. How can i retrieve the name of the document that the word is found in. I am using this project https://github.com/lucidworks/hadoop-solr

I am creating a collection using bin/solr -e cloud and i am using "data_driven_schema_configs" from server/solr/configsets/ directory.

I tryied adding <field name="fileName" type="string" indexed="true" stored="true" /> inside managed-schema at ~/solr-6.1.0/server/solr/configsets/data_driven_schema_configs/conf, and also change it name to schema.xml, but in this directory there isn't any dataConfig file to add <field column="file" name="fileName"/> as i see it in some other posts with similar questions, but not for SolrCloud, so i don't know if that i am trying is correct. What changes, and in which directories, i have to do, to be able to make it happen.

Example: I am searching the word "greatest" which can found in both documents. How can i see in which document is every result, sample1.txt or sample2.txt

enter image description here

Spyros_av
  • 854
  • 2
  • 8
  • 24
  • 2
    If those are the only fields in your index that describes the documents, you can't. How did you generate the index files? Those `id` values seems to be actual text from the documents, and not suitable unique ids. – MatsLindh Sep 10 '16 at 22:15
  • I am using this project https://github.com/LucidWorks/hadoop-solr @MatsLindh – Spyros_av Sep 11 '16 at 12:59
  • You should start reading Solr basics before asking. As @MatsLindh said, the first thing is that you should provide suitable unique ids for the `id` field. The actual text from the documents should be indexed in an apropriated text field, see [Solr Field Types](https://cwiki.apache.org/confluence/display/solr/Solr+Field+Types). Also if you want the name of the matched documents, why not indexing & storing the name of the documents ? – EricLavault Sep 15 '16 at 09:00
  • @Spyros_av please provide a sample of the data you send to Solr, with the update request. Are you runnning Solr in schemaless mode ? – EricLavault Sep 17 '16 at 19:47
  • @n0tting i forgot to mention that i am using SolrCloud. The data that i am using is same books in .txt format from https://www.gutenberg.org/ – Spyros_av Sep 17 '16 at 21:01

1 Answers1

3

Same thing I said when you mentioned this question on IRC:

Your Solr schema must contain a field where you put the name, set to stored="true", and you must include that field, with a relevant value, in every document when you index. Most schema changes require a full reindex.

https://wiki.apache.org/solr/HowToReindex

elyograg
  • 789
  • 3
  • 14
  • i have added this line, at manged-schema `` at this directory: `/solr-6.1.0/server/solr/configsets/data_driven_schema_configs/conf ` is that what you mean? – Spyros_av Sep 14 '16 at 21:28
  • And did you assure that this field is not only present, but also filled during the indexing process? And how should old documents of your index get a value into that field? Someone needs to write it in there. Henceforth, did you re-index after the schema extension? – cheffe Sep 19 '16 at 11:05
  • @elyograg what do you mean with that " and you must include that field, with a relevant value, in every document when you index." – Spyros_av Sep 19 '16 at 11:41
  • In order for a field to actually be useful, it must be populated. For the specific example (seeing a filename when indexing files) it's particularly important that every document actually include a filename field, and that the filename field has something useful in it. – elyograg Sep 27 '16 at 16:49