1

I'm implementing a grouped search in Solr. I'm looking for a way of summing one field and sort the results by this sum. With the following data example I hope it will be clearer.

{
  [
    {
      "id" : 1,
      "parent_id" : 22,
      "valueToBeSummed": 3
    },
    {
      "id" : 2,
      "parent_id" : 22,
      "valueToBeSummed": 1
    },
    {
      "id" : 3,
      "parent_id" : 33,
      "valueToBeSummed": 1
    },
    {
      "id" : 4,
      "parent_id" : 5,
      "valueToBeSummed": 21
    }
  ]
}

If the search is made over this data I'd like to obtain

{
  [
    {
      "numFound": 1,
      "summedValue" : 21,
      "parent_id" : 5
    },
    {
      "numFound": 2,
      "summedValue" : 4,
      "parent_id" : 22
    },
    {
      "numFound": 1,
      "summedValue" : 1,
      "parent_id" : 33
    }
  ]
}

Do you have any advice on this ?

Emilio Borraz
  • 516
  • 3
  • 17
  • The facet functions are the solution for my issue, with this I'm able to sort my results on a calculated value (sum) by group, facet results have pagination. – Emilio Borraz Jul 12 '15 at 17:24

3 Answers3

3

Solr 5.1+ (and 5.3) introduces Solr Facet functions to solve this exact issue.

From Yonik's introduction of the feature:

$ curl http://localhost:8983/solr/query -d 'q=*:*&
 json.facet={
   categories:{
     type : terms,
     field : cat,
     sort : "x desc",   // can also use sort:{x:desc}
     facet:{
       x : "avg(price)",
       y : "sum(price)"
     }
   }
 }
'

So the suggestion would be to upgrade to the newest version of Solr (the most recent version is currently 5.2.1, be advised that some of the syntax that's on the above link will be landed in 5.3 - the current release target).

MatsLindh
  • 49,529
  • 4
  • 53
  • 84
0

So you want to group your results on the field parent_id and inside each group you want to sum up the fields valueToBeSummed and then you want to sort the entire results (the groups) by this new summedvalue field. That is a very interesting use case...

Unfortunately, I don't think there is a built in way of doing what you have asked.

There are function queries which you can use to sort, there is a group.func parameter also, but they will not do what you have asked.

Have you already indexed this data? Or are you still in the process of charting out how to store this data? If its the latter then one possible way would be to have a summedvalue field for each documents and calculate this as and when a document gets indexed. For example, given the sample documents in your question, the first document will be indexed as

{
  "id" : 1,
  "parent_id" : 22,
  "valueToBeSummed": 3
  "summedvalue": 3
  "timestamp": current-timestamp
},

Before indexing the second document id:2 with parent_id:22 you will run a solr query to get the last indexed document with parent_id:22

Solr Query q=parent_id:22&sort=timestamp desc&rows=1

and add the summedvalue of id:1 with valueToBeSummed of id:2 So the next document will be indexed as

{
  "id" : 2,
  "parent_id" : 22,
  "valueToBeSummed": 1
  "summedvalue": 4
  "timestamp": current-timestamp
}

and so on.

Once you have documents indexed this way, you can run a regular solr query with &group=true&group.field=parent_id&sort=summedValue.

Please do let us know how you decide to implement it. Like I said its a very interesting use case! :)

jay
  • 2,067
  • 2
  • 16
  • 31
  • I think I will have to implement something like you said, documents with the summed value, I will give feedback later this week -- thanks -- – Emilio Borraz Jul 08 '15 at 18:25
-1

You can add the below query

select?q=*:*&stats=true&stats.field={!tag=piv1 sum=true}valueToBeSummed&facet=true&facet.pivot={!stats=piv1 facet.sort=index}parent_id&wt=json&indent=true

You need to use Stats Component for the requirement. You can get more information here. The idea is first define on what you need to have stats on. Here it is valueToBeSummed, and then we need to group on parent_id. We use facet.pivot for this functionality.

Regarding sort, when we do grouping, the default sorting order is based on count in each group. We can define based on the value too. I have done this above using facet.sort=index. So it sorted on parent_id which is the one we used for grouping. But your requirement is to sort on valueToBeSummed which is different from the grouping attribute.

As of now not sure, if we can achieve that. But will look into it and let you know.

In short, you got the grouping, you got the sum above. Just sort is pending

Ramzy
  • 6,948
  • 6
  • 18
  • 30
  • If you do stats on the parent_id field, according to the sample docs posted in question, you will probably get a response like this `{ stats_fields: { parent_id: { min: 5.0, max: 33.0, count: 4, missing: 0, sum: 82.0, ..... } } }` OP wants to group by parent_id, then sort the groups using sum of a field inside each group. – jay Jul 08 '15 at 09:07
  • @Ramzy Does the stats values allow to order the groups ? – Emilio Borraz Jul 08 '15 at 18:27