1

Introduction to the problem

I use the Apache's Lucene for java and I'd like to know how to drill-down automatically in a faceted search. More precisely, I want to, given a level of the taxonomy, get the facets of that level. For instance, if I use the Open Directory Project as a taoxnomy and I look for theatre at level 2 I want to drill-down in the taxonomy taking the path with more weight. In this case: Arts->performing_arts. This way I'll get a facted search for the categories inside performing_arts.

Problem

I know hot to make a faceted search. In the example above I would do:

            // 2. Query expansion
            IndexSearcher wnSearcher = new IndexSearcher(wnReader);
            //Query q = SynLookup.expand(querystr, wnSearcher, analyzer, "Contents", (float) 0.9);

            // 3. Query             
            // the "title" arg specifies the default field to use
            // when no field is explicitly specified in the query.
            Query q = new QueryParser(Version.LUCENE_36, "Contents", analyzer).parse(querystr);            

            // 3. search        
            Query matchAllDocs= new MatchAllDocsQuery();
            // Create the facets collector              
            FacetIndexingParams indexingParams = new DefaultFacetIndexingParams();
            FacetSearchParams facetSearchParams = new FacetSearchParams(indexingParams);
            CategoryPath top = new CategoryPath("Top/Arts/performing_arts",'/');
            FacetRequest neighborhoodFacetRequest = new CountFacetRequest(top, 13);          
            facetSearchParams.addFacetRequest(neighborhoodFacetRequest);
            FacetsCollector fc = new FacetsCollector(facetSearchParams, reader, taxonomyReader);
            IndexSearcher searcher = new IndexSearcher(reader);

            searcher.search(q, new QueryWrapperFilter(matchAllDocs), fc);

            // 4. display results
            System.out.println("Results: ");
            List<FacetResult> res = fc.getFacetResults();
            printFacetResult(res);

However, I must know the path to create the CategoryPath a priori... And I don't know how to get the whole results set and then get to the level I want. If I set the CategoryPath to the Top I only get the results for the first level.

A solution would be to get first the results for the first level, add the category with the maximum weight to the path, then perform a new faceted search and so on. But that is very inefficient!

Thank you!

synack
  • 1,699
  • 3
  • 24
  • 50

1 Answers1

1

Actually you dont just get the first level, lucene returns all levels but you need to get them from the facetCollector results using the getSubResults method. Its actually possible to get all levels in the category path this way. Using MatchAllDocs is not really all that good unless you want to provide a drilldown over the entire collections. It may be more appropriate to use a multi collector and provide some time of Query or Filter to limit your results.

With the code snippet below you could loop over all results, and all subresults to find the category path you are looking for, then use a DrillDown query on the first query

e.g:

for (FacetResult res : fc.getFacetResults()){
//this is the top lvl facet
  FacetResultNode toplvl = res.getFacetResultNode();
  System.out.println(toplvl.getLabel() + " (" + toplvl.getValue() + ")");
  for (FaceResultNode secondlvl : toplvl.getSubResults()) {
      //second lvl facet categories
      System.out.println("  " + secondlvl.getLabel().getComponent(1) 
                    + " (" + secondlvl.getValue() + ")");
  }
}
//your orginal query 'q' + the your cat
 Query q2 = DrillDown.query(indexingParams, q, cat);
garyrgilbert
  • 477
  • 5
  • 11
  • thanks for your answer. Then, if lucene returns all levels, which is the goal of the facet collector? I thought that you had to indicate in the facet collector the categories that you want to drill down. So, I don't need it? – synack Feb 04 '13 at 11:40
  • The facet collector collects the facets in the taxonomy index, you do need to of course add the root categorypath to the facet request, and indicate the max number of top facets to return. Also By setting the result mode:`facetRequest.setResultMode(ResultMode.PER_NODE_IN_TREE)` it will return the top facets of each child facet for the entire category tree. – garyrgilbert Feb 05 '13 at 09:16
  • @aryrgilbert thanks. I still have one last doubt: what do I have to do with `q2` in order to get the count of hits in the category `cat`? Btw, I passed as `indexingParam` the search parameters that I had used in when adding a facetRequest to the facetCollector, is that right? As `cat` I passed the subcategory of which I want to know the number of hits. – synack Feb 12 '13 at 11:14
  • I asked all this in another question: http://stackoverflow.com/questions/14852995/tree-search-with-lucene – synack Feb 13 '13 at 12:06