I am new with Solr and I am extracting metadata from binary files through URLs stored in my database. I would like to know what fields are available for indexing from PDFs (the ones that would be initiated as column=””). I would also like to know how to create customized fields in Solr. How is that implemented and mapped to specific metadata coming from the files. If someone has a code snippet that could show me it would be greatly appreciated. Thank you in advance.
Asked
Active
Viewed 3,261 times
1 Answers
0
To create custom fields in Solr, you will need to modify the schema.xml
file for your Solr installation. The schema.xml file that comes with the Solr example included in the distribution (found under the /example folder) includes a large number of predefined metadata fields for file extraction. For information on creating custom fields in Solr, please see the following:
Solr has a built in request handler for extracting and mapping metadata from binary files. For details, please referer to the following:

Paige Cook
- 22,415
- 3
- 57
- 68
-
Hi Paige, thanks a lot for your answer. I want to be able to index without using the curl command. My code is set to index dynamically. Could you please give me an example of how would I extract something like the file size, format or file type? What would be the column names that would relate to those types of fields? Also, I am wondering if you have a little code snippet of how to map custom fields. Do I have to declare that in the solrconfig.xml or do some more tweaking somewhere else? – Luis Mar 14 '13 at 18:51
-
For a code example, please see this link from the Solr wiki - http://wiki.apache.org/solr/ContentStreamUpdateRequestExample As for the column names, please refer to the example and adjust settings accordingly in the schema.xml using the links above as reference. – Paige Cook Mar 14 '13 at 18:58