0

I'm new to RavenDB and I have a Raven DB document:

Student
{
  Id : int
  Subjects : List<int>
}

I'm trying to write a query to get a intersection of the subjects of student with id 1 and student with id 2

{
  ID : 1
  Subjects : {22, 23, 25}
}

{
  ID : 2
  Subjects : {22, 25 }
}

The intersection these would be {22,25} I also need the count of the intersection subjects 2 in this case.

What's the best way to approach this type of query? Are there any other NoSQL solutions that handle this kind of query better? Also, I'm trying to cache the student collection in memory.

I need a db that supports sharding, and also I have a dataset of 15 million documents (I can shard them across different machines with a db solution like Raven or Mongo). I have to do this at db level, and I couldn't find anything how to do this at db level in the RavenDB documentation.

Matt Johnson-Pint
  • 230,703
  • 74
  • 448
  • 575
  • What have you tried? Post some code please. Also - why does it matter that it be done in the database? You can simply retrieve the two students and do the intersection yourself. You haven't explained why you need a db solution. Are you aggregating on these results in some way? How? – Matt Johnson-Pint Mar 19 '13 at 21:32
  • @MattJohnson I need the db solution for sharding and also I have a dataset of 15 million documents ( I can shard them across different machines with db solution like raven or mongo). So, have to do this at db level. I couldn't find anything how to do this at db level in the ravendb documentation. – Abhishek Andhavarapu Mar 19 '13 at 22:06
  • 1
    Sure, you might need to store 15 million documents. Are you wanting to query the intersections of 15 million documents also? Your example says you are retrieving 2. If this is typical, you should just load those two docs and do the intersection yourself in client-side linq (for example) – Matt Johnson-Pint Mar 19 '13 at 22:41
  • @MattJohnson If I have the 15 million students spread accross 3 machines. I need all the students who took subject 1 and subject 2. I was assuming if its at db level, the query is passed to all 3 machines and the query is excuted on the part of data each machine has and results are reduced back to the client. – Abhishek Andhavarapu Mar 19 '13 at 22:45
  • @MattJohnson I was assuming this kind of queries can be executed using map/reduce the results is keyed by the subject and values is list of student ids. – Abhishek Andhavarapu Mar 19 '13 at 22:57
  • All of what you are describing is possible, but you keep changing what you want, so I don't know how to answer. Do you need all students who took both subjects 22 and 25? Or do you need to know which subjects students 1 and 2 have in common? Those are two very different things. – Matt Johnson-Pint Mar 19 '13 at 23:37
  • @MattJohnson Sorry I need all the students who took both subject 22 and 25 – Abhishek Andhavarapu Mar 19 '13 at 23:41

1 Answers1

1

Based on your comments (which was not your original question), you can perform the following query:

var q = session.Query<Student>()
               .Where(x => x.Subjects.Any(y => y == 22))
               .Intersect()
               .Where(x => x.Subjects.Any(y => y == 25));

The equivalent Lucene query would be:

Subjects:22 INTERSECT Subjects:25

Given this data:

Student { Id = 1, Subjects = new List<int> { 22, 23, 25 } }
Student { Id = 2, Subjects = new List<int> { 22, 25 } }
Student { Id = 3, Subjects = new List<int> { 23, 25} }
Student { Id = 4, Subjects = new List<int> { 22 } }

Only students 1 and 2 will be returned, because 3 and 4 do not have both values.

You can read more about Intersection Queries in the documentation.

Matt Johnson-Pint
  • 230,703
  • 74
  • 448
  • 575
  • Matt thanks for the reply. Again if the 15 million respondents are spread across 3 machines Is the query executed on the 3 machines parallely ? Or Is the data aggregated to one place and then query is executed ? – Abhishek Andhavarapu Mar 20 '13 at 03:45
  • They run in parallel and the results are merged using your shard merge strategy. See http://ravendb.net/docs/server/scaling-out/sharding – Matt Johnson-Pint Mar 20 '13 at 04:13