1

I have a lot of pdf files (text inside), and I want to build a simple search engine to search the sentences which contains the given keywords. After several hours' searching, I chose solr as the tool.

I am new to solr. I downloaded latest solr 6.5.0 and set it up in windows 7. I have used the following commands to create a collection called gettingstarted and can search operation by visiting the link http://localhost:8983/solr/gettingstarted/browse

bin\solr.cmd start
bin\solr.cmd create -c gettingstarted
java -Dauto -Dc=gettingstarted -Drecursive -jar example/exampledocs/post.jar  *.pdf

However, it only shows the filename which contains the keyword rather than the detail lines of the file. The following picture shows this case: Only filename rather than sentences which contains keywords

I also tried the integrated example called techproducts and to my surprise, it can show the exact sentences which contains the keywords. The following picture shows this case: show the sentences

So I have a question if I can do something to enable the sentences which contains exact keywords show in the first picture. I don't know about velocity, config files and even the underlying principles. I just want it work, giving the detail search results. I do not care about the security issues and also do not care about the way it shows (uglyness is OK).

It is the first day I play with solr, so maybe I made some mistakes about the description. Thanks for your patience. I need your help.

peng li
  • 49
  • 6

1 Answers1

1

http://localhost:8983/solr/gettingstarted/browse this is example UI application (solritas )which comes by default with solr.

You should use /select request handler to query, which handles you query and retrieve results. http://localhost:8983/solr/gettingstarted/select?q=keyword

For Indexing PDF.

when you index pdf, all content inside pdf goes to field called content by default.

Example:

Assuming you created gettingstarted collection already.

Navigate to directory example/exampledocs/ and hit this command.

java -Dauto -Dc=gettingstarted -jar post.jar solr-word.pdf

if it indexed successfully. go to admin and search for keyword inside pdf, it should give content field with value (text inside pdf)

example query request URL

http://localhost:8983/solr/gettingstarted/select?q=solr&wt=json&indent=on

Vinod
  • 1,965
  • 1
  • 9
  • 18