0

I am struggling creating an simple index with ravendb.

Given are two document collections:

User (34000 docs) and BlogEntries (1.5 million docs)

1) How can I create an index that shows the count of blog entries for each user?

The collections relationship is as following:

User.LastName + "," + User.FirstName = Blog.CreatedBy

Important is the fact that BlogEntries contains old entries that are not related to the user collection. I want to filter those entries out, so that they are not appear in the index. That's why I need the user collection here.

Sample Data:

User Collection:
User U1
User U2

BlogEntry Collection:
BlogEntry B1 -> U1
BlogEntry B2 -> U1
BlogEntry B3 -> U2
BlogEntry B4 -> XYZ1 
BlogEntry B5 -> XYZ2
BlogEntry B6 -> U1

I want to filter out the B4 and B5 entries, cause they are not related to a user in the user collection.

2) Do I have to use a multimap index for that?

3) I already tried the following via the management studio, but the index does not work. Seems I cannot use two document collections in a single map block.

Map:

from user in docs.Users
from blog in docs.Blogs
where blog.CreatedBy = user.LastName + "," + user.FirstName
select new { UserName = user.LastName ..., Count = 1 }

Reduce:

from result in results group by result.UserName
into g
select new { User = g.Key, g.Sum( x => x.Count) }

Thanks, Marius

Marius
  • 239
  • 3
  • 11

1 Answers1

1

With the changed requirement I guess you need a multi map index:

AddMap<User>(users => from user in users
                      select new
                      {
                          UserName = user.LastName + "," + user.FirstName,
                          HasUser = true,
                          Count = 0
                      });

AddMap<BlogEntry>(blogEntries => from blogEntry in blogEntries
                                 select new
                                 {
                                     UserName = blogEntry.CreatedBy,
                                     HasUser = false,
                                     Count = 1
                                 });

Reduce = results => from result in results
                    group result by result.UserName 
                    into g
                    select new
                    {
                        UserName = g.Key,
                        HasUser = g.Any(x => x.HasUser),
                        Count = g.Sum(x => x.Count)
                    };

You can filter the index by the HasUser property.

Thomas Freudenberg
  • 5,048
  • 1
  • 35
  • 44
  • Hi Thomas, I forgot to mention that Blogs does not only contain entries that are related to users. Don't ask me why. I think they are old entries coming from a data migration many years ago. So, basically I want to filter the old entries out. I will modify my initial post. – Marius Feb 19 '14 at 09:57
  • Thank you, Thomas. After the index is created, what does the index contain? All blog entries grouped by UserName? The problem is, BlogEntry.CreatedBy does not always contain a user that is in the users collection. – Marius Feb 19 '14 at 11:09
  • Updated post with sample data. – Marius Feb 19 '14 at 11:13
  • What about adding one more field to the map return type. Like IsInUserCollection = true for AddMap. IsInUserCollection = false for AddMap. Then, in the reduce function, add IsInUserCollection = g.Any( x => x.IsInUserCollection). But I will also end up with having all blog entries in the index. But it would be nice to have only the user related blog entries in the collection, cause I think that are much less than the 1.5 million entries. – Marius Feb 19 '14 at 11:25
  • I added a `HasUser`, similar to your proposed `IsInUserCollection`. You still need all entries in the index, because RavenDB may index the BlogEntry documents before the User documents. – Thomas Freudenberg Feb 19 '14 at 11:41