3

I am using this project :https://github.com/lucidworks/hadoop-solr Does anyone know in which value is saved the name (or the path) of the document that is being processed. I want to retrieve this value to Solr Admin (adding a field with its name to my schema). Is this possible?

Example:i want to able to see the name of the document, from which the query returns same results.

i am running the project with this command :

    hadoop jar solr-hadoop-job-2.2.5.jar 
    com.lucidworks.hadoop.ingest.IngestJob  
    -Dlww.commit.on.close=true -DcsvDelimiter= 
   -cls com.lucidworks.hadoop.ingest.CSVIngestMapper -c spyros1  
    - i  /usr/local/hadoop/input 
    -of com.lucidworks.hadoop.io.LWMapRedOutputFormat 
    -s http://127.0.1.1:8983/solr
Spyros_av
  • 854
  • 2
  • 8
  • 24

2 Answers2

3

This worked for me :

hadoop jar solr-hadoop-job-2.2.5.jar com.lucidworks.hadoop.ingest.IngestJob  
    -Dlww.commit.on.close=true 
    -Dcom.lucidworks.hadoop.ingest.RegexIngestMapper.regex="\\w+" 
   -Dcom.lucidworks.hadoop.ingest.RegexIngestMapper.groups_to_fields=0=match_ss  
   -cls com.lucidworks.hadoop.ingest.RegexIngestMapper  
   -c collection1 -i /path/* -s http://127.0.1.1:8983/solr
   -of com.lucidworks.hadoop.io.LWMapRedOutputFormat 

Also see this for more info.

Spyros_av
  • 854
  • 2
  • 8
  • 24
1

For the CSVIngestMapper the file path is not currently added to any Solr field.

Feel free to create an issue in the repo. https://github.com/lucidworks/hadoop-solr

Also PRs are welcome

EDIT: (See https://github.com/lucidworks/hadoop-solr/issues/16 for the solution)

acesar
  • 170
  • 4
  • So is there any way to retrieve the name or the path of the document that result belong to? If i am using 2 txt documents how i am supposed to know from which txt the result came from. Is available for any other Ingest Mapper? @acesar – Spyros_av Sep 21 '16 at 19:18
  • 1
    > Is available for any other Ingest Mapper? yes, the RegexIngestMapper/GrokIngestMapper add a field called path. But I am not complete sure if I am following your use case. – acesar Sep 21 '16 at 19:59
  • I am using "data_driven_schema_configs" for my collection. The field must be added inside `managed-schema` of the data_driven_schema_configs right? @acesar – Spyros_av Sep 21 '16 at 20:10