I am trying to solve the following search problem. Say we have 10 different documents d1..d10 Each document contains a type of data say, d1 -> list of movie names, d2 -> list of actor names, d3 -> list of addresses etc. Each document contains list of entities and scores. So d1 contains movie names and their popularity etc. Assume the scores are all normalized(0-max_score across the documents)
Now given a search query(phrase), I want to score the 10 documents based on how relevant is is to the search phrase.
My question is if using lucene is a good way to approach this? I plan to index each phrase with its score into separate document inside lucene and then query for the top match.
I don't want to search for the individual entities. I am okay with getting the over all score of entity type for a given search phrase. For example if some one searches of lord of the rings, I need to be able to say that it is most likely a movie and not a actor or address. My goal is minimize space consumption and optimize performance