0

I am an intern building a search engine for my company. This search engine should search for data using different APIs in addition to a web crawler and then index the returned data. I thought about using solr to index this returned data.

I would first want your advice on whether it is a good idea. I also want to know if I would encounter issues in regards to indexing JSON and Atom, as I do not know the name of the tags in advanced.

Thank you

Dan Lowe
  • 51,713
  • 20
  • 123
  • 112
Omar Jaafor
  • 161
  • 3
  • 12

2 Answers2

1

Please go ahead as you are proceeding in the right direction. Answer to the second part of your question is Yes you would encounter problems while indexing, like schema issues,Indexing Nested jsons, etc. and these issues can be resolved using plug ins or Data Import Handlers (DIH).

0

First of all, you can index atom and json data using solr. There are two ways to do that:

1) parse the data and map each field of the parsed data to a field in solr. 2) do not parse the data but rather give whole files to Apache Tika (that would do the job). A way to do that is to save the data in a file and index the file using update/extract.

Omar Jaafor
  • 161
  • 3
  • 12