0

I am new to SolrCloud and in learning mode.. I want to get intersection of multiple search queries. We can look at it like inner join of multiple result sets. In my knowledge this can be solved by - 1) Multiple join queries among pairs of data sources. We could use even subqueries. 2) The better option I am thinking of is writing custom request handler. This request handler will make search queries on each data sources and find intersection among them. Format response and return this as final result set. I would like to know what is the best approach to solve it in Solr. I am not sure how can I make multiple search queries inside custom request handler. Thank you

nil
  • 563
  • 5
  • 8
  • Can you please specify what you mean by 'intersection of multiple queries' ? because I can easily assume you just need AND operator in your query. Or do you need to perform different queries to different cores ? Just give an example of the query you want to run. – Emad Apr 02 '14 at 23:16
  • @Emad eg, we have 3 different data sources DS1, DS2, DS3 `DS1- {(cust_id1, a1, b1),(cust_id2, a2, b2),(cust_id3, a3, b3)} DS2- {(cust_id1, c1, d1),(cust_id3, c2, d2), (cust_id4, c3, d4)} DS3- {(cust_id1, e1, f1),(cust_id3, e2, f2)} DS4- {(cust_id1, g1, h1),(cust_id2, g2, h2),(cust_id3, g3, h3),(cust_id4, g4, h4)} ` ` output-Common cust_id in all 4 data sources `{(cust1, a1, b1), (cust_id1, c1, d1), (cust_id1, e1, f1), (cust_id1, g1, h1), (cust_id3, a3, b3), (cust_id3, c2, d2), (cust_id3, e2, f2),(cust_id3, g3, h3)} ` **In SQL like INNER JOIN of DS1, DS2, DS3, DS4 on cust_id.** – nil Apr 03 '14 at 03:23
  • @Emad I was thinking a requesthandler which will make solr search qeuery for all 4 data sources, get 4 result sets and then find common cust_id in all 4 result sets. Format the output and give back to end user. Can we do something like this in requestHandler? I couldnt find any example showing making solr search query requests within request handler and processing them. – nil Apr 03 '14 at 03:29

1 Answers1

0

I am going to post an answer here because this might be very similar to a case I dealt with before using Solr Result Grouping / Field Collapsing

https://wiki.apache.org/solr/FieldCollapsing

What I would do in this case is :

  • Use a single core. Multiple cores for this one is not really needed.

  • Generate new unique Ids for each different document from each Data Source

  • Use the Id you already have as the 'Group Id' (so we can know that same document but from different data sources are basically the same group). In this case this would be cust_id1, cust_id2, etc.

  • Maybe add another field to specify the data source name.

And In the query I would use grouping feature by saying group=true&group.field=GroupId

So, what this will do is try to group results by the 'Group Id', and will by default get the one document per group. (default is one) but you can change group.limit to another number.

Emad
  • 544
  • 2
  • 6