1

I'm facing an index problem for which I can't see a solution yet.

I have the following document structure per board:

{
    "Name": "Test Board",
    ...
    "Settings": {
        "Admins": [ "USER1", "USER2" ],
        "Members": [ "USER3", "USER4", "USER5" ]
        ...
    },
    ...
    "CreatedBy": "USER1",
    "CreatedOn": "2014-09-26T18:14:20.0858945"
    ...
}

Now I'd like to be able to retrieve the count of all users which are somewhere registered in a board. Of course this should not only count the number of user occurences but rather count the number of distinct users. One user can be member of multiple boards.

This operation should perform as fast as possible since it is displayed in a global statistics dashboard visible on each page. Therefor I chose to try it with an index instead of retrieving all boards and their users and do the work on client side.

Trying to achieve this by using a Map/Reduce index:

Map = boards => from board in boards
    select new
    {
        Aggregation = "ALL",
        Users = new object[]
        {
            board.CreatedBy,
            board.Settings.Admins,
            board.Settings.Members
        },
        NumberOfUsers = 1
    };

Reduce = results => from res in results
    group res by new
    {
        res.Aggregation
    }
    into g
    select new
    {
        g.Key.Aggregation,
        Users = g.Select(x => x.Users),
        NumberOfUsers = g.Sum(x => x.Users.Length)
    };

Obviously this results in a wrong count. I don't have any experience with Reduce yet so I appreciate any tip! The solution will be probably pretty easy...

What would be the best way to globally distinct CreatedBy, Admins and Members of all documents and return the count?

casperOne
  • 73,706
  • 19
  • 184
  • 253
  • The list of distinct users can be a million names long, is that supposed to be a single entry? – Ayende Rahien Dec 09 '14 at 14:24
  • Yes, it is supposed to return a single number, not further details. The absolute maximum number of names would be about 20'000 if nearly every user in our company is using this tool - which is highly unlikely. :) Edit: Currently there are about 1500 distinct users... – webstronaut.ch Dec 10 '14 at 10:46

3 Answers3

0

Use an index like this:

from board in docs.Boards
select new 
{
    Users = board.Settings.Admins.Count + board.Settings.Members.Count + 1 /* created by */ 
}

from r in results
group r by "all" into g
select new
{
    Users = g.Sum(x=>x.Users)
}
Ayende Rahien
  • 22,925
  • 1
  • 36
  • 41
  • Thanks for your reply! Unfortunately this yields the wrong count. It returns the number of all users registered on any board. What I'm struggling with is the number of **distinct** users. If one person is registered in multiple boards, which occurs then and now, it will be counted multiple times. The index counts the number of users per board and then groups them together. Is there another fast way, maybe even without an index? – webstronaut.ch Dec 13 '14 at 17:19
0

The best I could come up so far is:

Map = boards => from board in boards
    select new
    {
        Users = new object[]
        {
            board.CreatedBy,
            board.Settings.Admins,
            board.Settings.Members
        }
    };

Reduce = results => from r in results
    group r by "all" into g
    select new
    {
        Users = g.SelectMany(x => x.Users)
    };

And then query for the distinct user count:

var allUsersQuery = _documentSession.Query<AllUsersIndex.Result, AllUsersIndex>();
return allUsersQuery.Any() ? allUsersQuery.First().Users.Distinct().Count() : 0;

At least the query only returns a list of all usernames on all boards instead of bigger object trees. But the uniqueness still has to be done client-side.

If there is any better way please let me know. It would be beautiful to have only one integer returned from the server...

0

Then use this:

from board in docs.Boards
from user in board.Settings.Admins.Concat(board.Settings.Members).Concat(new[]{board.CreatedBy})
select new 
{
   User = user,
   Count = 1
}

from r in results
group r by r.User into g
select new
{
    User = g.Key,
    Count = g.Sum(x=>x.Count)
}

I'm not really happy about the fanout, but this will give you all the discint users and the number of times they appear. If you want just the number of distinct users, just get the total results from the index.

Ayende Rahien
  • 22,925
  • 1
  • 36
  • 41