2

DSpace offers search on the bases of item name but if a item contains multiple files so can we perform search on the basis of that item name ?

How we can customize DSpace to allow search on the bases of file name inside a item because a item can contain multiple files ?

MartinW
  • 4,966
  • 2
  • 24
  • 60
Ruchi arora
  • 113
  • 1
  • 8

2 Answers2

2

Assuming you're using the Solr-based search option (default since at least DSpace 4.x), the fulltext file names are indexed to the stream_name field. Just put stream_name:xyz into the search box to find items that have a file whose name contains "xyz".

For example, this search (XMLUI) / this search (JSPUI) finds the two items on the demo server that have a file called Atlas.pdf (note the search results will change as the demo server items change).

schweerelos
  • 2,189
  • 2
  • 17
  • 25
  • Thank you. But as you wrote, only the *fulltext* file names are indexed. E.g. file names of image files seem not to be written to the index? Also, while the JSON suggests an array, more than one filename are just written to one single string: `"stream_name": ["file1.pdf.txt;file2.xlsx.txt"]` – MartinW Sep 04 '17 at 09:32
  • Where would I look to extend DSpace, so that I can write all filenames (and formats and other info I can extract from files) to the Solr "search" core? – MartinW Sep 07 '17 at 09:21
  • I just added a new `SolrServiceIndexPlugin` implementation `SolrServiceFilenamesPlugin` and added this to `discovery.xml` as a bean. Worked very well :) Would this be the right way? – MartinW Sep 07 '17 at 11:51
  • Sounds great MartinW -- please consider making a pull request to have this included in the main DSpace codebase https://github.com/DSpace/DSpace – schweerelos Sep 07 '17 at 21:20
  • Thank you, I will do so, as soon as I have tested and documented. Do you know, if I can just add fields to the Solr document, or do I have to consider entries in some schema file as well? (It does work by just adding fields to the document, but I want to do it correctly of course.) – MartinW Sep 08 '17 at 09:38
  • MartinW the solr schema for search is here: https://github.com/DSpace/DSpace/blob/master/dspace/solr/search/conf/schema.xml it includes some dynamic fields, eg all fields of pattern *_s are string fields and don't need to be defined explicitly. Perhaps ask on the DSpace dev mailing list if you're unsure what field name to go with. – schweerelos Sep 10 '17 at 21:18
  • Thanks for all your comments. [I published the plugin as a Gist.](https://gist.github.com/MW3000/5771c0488f8952bb888357ce8c655279) – MartinW Sep 11 '17 at 09:15
1

I just wrote SolrServiceFileInfoPlugin, an implementation of SolrServiceIndexPlugin. This plugin adds all filenames and file descriptions from files in the ORIGINAL bundle to the Solr index and makes them discoverable via search.

See this Gist (Search for filenames and file descriptions in DSpace) for the code and documentation.

Thanks to @schweerelos, for her comments.

MartinW
  • 4,966
  • 2
  • 24
  • 60