3

Consider that the entities that I need to perform text search are as following

Sample{
    int ID, //Unique ID
    string Name,//Searchable field
    string Description //Searchable field
}

Now, I have several such entities which are commonly shared by all the users but each user can associate different tags, Notes etc to any of these entities. For simplicity lets say a user can add tags to a Sample entity.

UserSampleData{
    int ID, //Sample ID
    int UserID, //For condition
    string tags //Searchable field
}

When a user performs search, I want to search for the given string in the fields Name, Description and tags associated to that Sample by the current user. I am pretty new to using lucene indexing and I am not able to figure how can I design a index and also the queries for such a situation. I need the results sorted on the relevance with the search query. Following approaches crossed my mind, but I have a feeling there could be better solutions:

  1. Separately query 2 different entities Samples and UserSampleData and somehow mix the 2 results. For the results that intersect, we need to combine the match scores by may be averaging.
    1. Flatten out the data by combining both the entities => multiple entries for same ID.
labyrinth
  • 1,104
  • 3
  • 11
  • 32

2 Answers2

4

You could use a JoinUtil Lucene class but you must rename the second "ID" field of UserDataSample document into SAMPLE_ID (or another name different from "ID"). Below an example:

  r = DirectoryReader.open(dir);
  final Version version = Version.LUCENE_47; // Your lucene version
  final IndexSearcher searcher = new IndexSearcher(r);

  final String fromField = "ID";
  final boolean multipleValuesPerDocument = false;
  final String toField = "SAMPLE_ID";
  String querystr = "UserID:xxxx AND yourQueryString"; //the userID condition and your query String

  Query fromQuery = new QueryParser(version, "NAME", new WhitespaceAnalyzer(version)).parse(querystr);
  final Query joinQuery = JoinUtil.createJoinQuery(fromField, multipleValuesPerDocument, toField, fromQuery, searcher, ScoreMode.None);

  final TopDocs topDocs = searcher.search(joinQuery, 10);

Check the bug https://issues.apache.org/jira/browse/LUCENE-4824). I don't know if the bug is automatically solved into the current version of LUCENE otherwise I think you must convert the type of your ID fields to String.

Simona R.
  • 558
  • 6
  • 20
  • Thanks for answer!! This looks straightforward but I hope this is supported in lucene.net since I am using that. I completely forgot to mention this in the question but do you have any idea whether it is supported in lucene.net? – labyrinth Sep 09 '15 at 05:46
  • I have posted a separate question for that here http://stackoverflow.com/questions/32472071/not-able-to-find-joinutil-in-lucene-net. Thanks again! – labyrinth Sep 09 '15 at 06:12
  • @labyrinth I have no idea how to use JoinUtil in lucene.net, because I use only Lucene for JAVA. But if I found something I'll reply to your new question :) – Simona R. Sep 09 '15 at 17:59
  • Thanks!. Since I am just in exploratory phase, I have started to check elastic search too. It has some ways to handle it but need to figure out how good would it be at performance – labyrinth Sep 09 '15 at 18:29
0

I think that you need Relational Data. Handling relational data is not simple with Lucene. This is a useful blog post for.

mhbashari
  • 482
  • 3
  • 16