0

I have Solr documents that can have 3 possible states (state_s in {new, updated, lost}). These documents have a field named ip_s. These documents also have a field nlink_i that can be equal to 0.

What I want to know is: how many new ip_s I have. Where I consider a new ip is an ip that belong to a document whose state_s="new" that does not appear in any document with state_s = "updated" OR state_s = "lost" .

Using Solr facet search I found a solution using the following query parameters:

  • q=sate_s:"lost"+OR+sate_s:"updated"
  • facet=true&facet.field=ip_s&facet.limit=-1

Basically, all ip in

"facet_fields":{
      "ip_s":[
        "105.25.12.114",1,
        "105.25.15.114",1,
        "114.28.65.76",0,
        ...]

with 0 occurence (e.g. 114.28.65.76) are "new ips".

Q1: Is there a better way to do this search. Because using the facet query describe above I still need to read the list of ip_s and count all ip with occurence = 0.

Q2: If I want to do the same search, (i.e. get the new ip) but I want to consider only documents where nlink_i>0 how can I do?. If I add a filter : fq=nlink_i:[1 TO *] all ip appearing in documents with link_i=0 will also have their number of occurrence set to 0. So I cannot not apply the solution describe above to get new ip.

lizzie
  • 1,506
  • 1
  • 18
  • 31

3 Answers3

1

Q1: To avoid the 0 count facets, you can use facet.mincount=1.

Q2: I think the solution above should also answer Q2?

Yann
  • 1,019
  • 1
  • 8
  • 18
  • I need the 0 ones, I only need them basically if I could set facet.maxcount=0 I would but it does not exist. – lizzie Feb 12 '15 at 16:42
  • I see - there is a Solr ticket about creating a facet.maxcount param; I don't see how you could get what you want without it via facetting. You could also look into inner queries; this link is relevant to your question: http://stackoverflow.com/questions/24651759/solr-join-not-in-subselect – Yann Feb 13 '15 at 09:26
1

Alternatively to facets you can use Solr grouping functionality. The aggregation of values for your Q1 does not get much nicer, but at least Q2 works as well. It would look something like:

select?q=*:*&group=true&group.field=ip_s&group.sort=state_s asc&group.limit=1

In order for your programmatic aggregation logic to work, you would have to change your state_s value for new entries to something that appears first for ascending ordering. Then you would count all groups that contain a document with a "new-state-document" as first entry. The same logic still works if you add a fq parameter to address Q2.

Fritz Duchardt
  • 11,026
  • 4
  • 41
  • 60
  • That was a nice suggestion thank you. I haven't managed to get facet count inside groups. However grouping is nice especially if you need to get the id of the documents ! – lizzie Feb 13 '15 at 16:30
  • You don't need to use the facet count - instead you count groups. – Fritz Duchardt Feb 13 '15 at 17:06
0

I found another solution using facet.pivot that works for Q1 and Q2:

http://localhost:8983/solr/collection1/query?q=nbLink_i:[1%20TO%20*]&updated&facet=true&facet.pivot=ip_s,state_s&facet.limit=-1&rows=0
lizzie
  • 1,506
  • 1
  • 18
  • 31