3

My document structure in CouchDB looks like below:

{
 "_id": "0a68cdbe4a7f3abf4046bc",
 "_rev": "1-1508",
 "score": {"math":90, "physics": 88, "chemistry": 60},
 "student_name": "Mike"
}

I need to show below stats in the front end to setup a student's profile:

  • Given a student _id, how can I retrieve that student's rank in each course;
  • Given a student _id, how can I retrieve that student's rank for his total score (math+physics+chemistry).

Suppose I have only 2 students, the 2nd student's record is like below:

{
 "_id": "0a68cdbe2344a3abf4046bc",
 "_rev": "1-1608",
 "score": {"math":80, "physics": 98, "chemistry": 90},
 "student_name": "Jane"
}

So Mike's rank should be:

math: 1
physics: 2
chemistry: 2
total: 2

and Jane's rank should be

math: 2
physics: 1
chemistry: 1
total: 1

Let me know if I did not state the problem clearly.

I did not figure out the way to create views to get the rank. What I have tried:

  • create views that map score to student info. Then I can query a score range to get students that their score is in that range.

Edit: the functionality of query by user name and retrieval of rank does not need to be implemented by a view only. Any idea is welcomed!

Edit2: The number of courses would be 1K to 3K. The number of students would be 1M to 2M.

greeness
  • 15,956
  • 5
  • 50
  • 80

4 Answers4

6

I think it is not posible to do what you want with just a view. would try a map function like that:

function(doc) {
  emit (["math", doc.score.math], doc.student_name);
  emit (["physics", doc.score.physics], doc.student_name);
  emit (["chemistry",doc.score.chemistry], doc.student_name);
  emit (["total",doc.score.math+doc.score.physics+doc.score.chemistry], doc.student_name);
}

Then I would query by course. This would return a list of students in the values ordered by score. After that, I think you'll have to pick the rank in your software programmatically.

I think a reduce function would be of no use because the function wouldn't shrink the result set and also because I can't come up with a way to query by student name and have the whole list of students at the same time. A list I also think would be of no use because again I don't see how it would be possible to let the list know the name of the student while at the same time having the whole set of students in the result.

joscas
  • 7,474
  • 5
  • 39
  • 59
  • Thanks for the answer. This is similar to what I have now. If the rank has to be picked by programming, then it will be slow at run time, right? – greeness Nov 20 '12 at 01:06
  • It will be slow depending on the number of students and courses but couch has already sorted the data for you and this should alleviate a lot of the cpu load. – joscas Nov 20 '12 at 13:36
  • Yeah, that is my concern. I will have thousands of courses and millions of students (this is an online course). Also if there is any new student who finishes a course and gets a score, every other student's rank that is after him should be updated. This is not what I expected. – greeness Nov 20 '12 at 22:49
  • What about one of these two strategies then: 1- Use a script to consult your view and save the updated ranks in a separate DB. You could run this script often enough to give a nearly real time effect. 2- Use the view chaining feature of Cloudant: http://support.cloudant.com/customer/portal/articles/359310-chained-mapreduce-views – joscas Nov 27 '12 at 19:59
  • Also, if you use a script, you could rely on the _changes API to update the second DB each time a document changes and this would be real time. – joscas Nov 27 '12 at 20:19
2

Maybe a combination of a sorted view and a list would work.

Your resultByChemistryScore view would look like this

function(doc) {
    emit(doc.score.chemistry, [doc._id, doc.student_name]);
}

Then your GET request would be something like http://localhost:5984/results/_design/results/_view/resultByChemistryScore?descending=true You could always implement pagination at this point as well by using the offset and limit GET query parameters.

From this point, a list can count until it gets to your specified student.

Your list function "rank" would look something like this

function(head, req) {
    start({ "headers": { "content-type": "application/json" } } );
    var row, rank = 0; 
    while ( row = getRow() ) {
        if ( row.id == req.query.id ) break;
        // increment rank if not a tie
        if ( old_row != null && old_row.key != row.key ) 
           rank++;
        old_row = row;
    }; 
    send( JSON.stringify( { "rank" : rank } ) );
}

And your request would basically be http://localhost:5984/results/_design/results/_list/rank/resultByChemistryScore?id=fet&descending=true.

It's not very pretty, I'll give you that. It would probably take a while for the server to go through the whole list if you were say...the 1,000,000th best chemistry student. But certainly easier for the server to do it than the client.

Edit Added the tie handling case

fet
  • 614
  • 5
  • 12
  • Thanks for the answer. This approach has the same constraint as I discussed with @joscas. I know what I expect might not be possible to do with couchdb only though. Thanks anyway! – greeness Nov 21 '12 at 08:41
  • A refinement to this answer would deal with ties. Two students with the same score in a class have the same rank. For example, if two students share the highest score they should both be ranked #1. The next student following them is ranked #3. – lambmj Nov 26 '12 at 14:43
  • You're right @greeness, except this is a little bit faster since we're 1) relying on the server to do the heavy lifting, 2) the heavy lifting doesn't require any additional memory, only a counter. But you're right, it's not ideal. If you wanted a constant factor time lookup, I might suggest using something like my resultByChemistryScore view, and then keeping another database updated with only the rank data. Then you could use a view with the reversed key/value on your rank database. emit(sudentId, rank). but you would have to update that database with a cron something. – fet Nov 26 '12 at 16:14
  • @lambmj, yeah, you're right. I think if we compared the row we had currently to the previous row, and only incremented the counter if it was a different score that would ensure students with the scores: [ 100, 90, 90, 80 ] would have the ranks [ 1, 2, 2, 3 ]. Does that algorithm work for you? – fet Nov 26 '12 at 16:19
  • Yes, that's correct. See for example the answer I just posted. – lambmj Nov 26 '12 at 16:28
  • Thanks guys, I upvoted all your answers including @joscas. Helpful discussions! – greeness Nov 26 '12 at 19:07
2

So I don't think there is a solution that can be accomplished entirely in CouchDB that will retrieve a single value for a student,subject pairing. However, a map/reduce view can be created that will produce nearly what you're looking for. The results of that view can then be used to find the rank for a student,subject pair.

We start by building a view with a map very similar to the one suggested by joscas. The only difference is that the subject names aren't hardcoded:

map.js

function(doc) {
    var total = 0;
    for (var subject in doc.score)  {
        var score = doc.score[subject];
        emit([subject, score], doc.student_name);
        total += score;
    }
    emit(["total", total], doc.student_name);
}

We pair that with a reduce function that will produce a ranking for each subject given a group=true and grouping_level=1

reduce.js

function(keys, values) {
    var rankings = {};              // In order to return ties, a simple array can't be used.
    var rank = 0;
    var place = 0;
    var last_score = -1;
    for (var i = 0; i < values.length; i++) {
        var name = values[i];
        var score = keys[i][0][1];  // The 0th element of the key is the [subject, score] array.
        if (score == last_score) {
            // Tie, add another student to this rank.
            place++;
        } else {
            // Not a tie, create a new rank.
            rank += (place + 1);
            rankings[rank] = new Array();
            place = 0;
            last_score = score;
        }
        rankings[rank][place] = name;
    }
    return rankings;
}

data

I added a third student to data set and created some ties to make it interesting. Here's the data used:

{
    "_id": "ce6b2cd97e73258014679ab7bb9e7cdc",
    "_rev": "2-b62581d22c186bfc8ebe1703a2dfb506",
    "score": {
        "chemistry": 60,
        "math": 90,
        "physics": 88
    },
    "student_name": "Mike"
}

{
    "_id": "ce6b2cd97e73258014679ab7bb9e8ada",
    "_rev": "5-94d6cfbd3cf22f903ebc306570d1f1af",
    "score": {
        "chemistry": 90,
        "math": 90,
        "physics": 98
    },
    "student_name": "Jane"
}

{
    "_id": "ce6b2cd97e73258014679ab7bb9e960b",
    "_rev": "1-d8c7fe88de63cf3d6e9743696f96aad0",
    "score": {
        "chemistry": 61,
        "math": 89,
        "physics": 88
    },
    "student_name":
    "Charlie"
}

results

The view is saved as rank and can be queried like this:

http://127.0.0.1:5984/atest/_design/atest/_view/rank?group=true&group_level=1

Which produces this result:

{
    "rows":[
        {"key":["chemistry"],"value":{"1":["Jane"],"2":["Charlie"],"3":["Mike"]}},
        {"key":["math"],"value":{"1":["Jane","Mike"],"3":["Charlie"]}},
        {"key":["physics"],"value":{"1":["Jane"],"2":["Charlie","Mike"]}},
        {"key":["total"],"value":{"1":["Jane"],"2":["Charlie","Mike"]}}
    ]
}

The view can be queried by subject like this (assuming the lowest score is 0 and the highest score is 100):

http://127.0.0.1:5984/atest/_design/atest/_view/rank?group=true&group_level=1&startkey=%5B%22math%22,0%5D&endkey=%5B%22math%22,100%5D

(without the url encoding):

http://127.0.0.1:5984/atest/_design/atest/_view/rank?group=true&group_level=1&startkey=["math",0]&endkey=["math",100]

Which produces this result:

{
    "rows":[
        {"key":["math"],"value":{"1":["Jane","Mike"],"3":["Charlie"]}}
    ]
}

The resulting dictionaries can be searched using Javascript (or other client side technology) to determine the rank of the student in a single (or all) subjects.

lambmj
  • 1,843
  • 2
  • 21
  • 27
  • Thanks @lambmj for considering the `tie` case. This is helpful though not solving my problem completely. I am considering using an external store for user-rank query now (redis for example, which get ranking from the view you created). any thoughts about that? – greeness Nov 26 '12 at 22:16
  • I think that is a good idea as there is no direct way to do what you want to do in CouchDB. – lambmj Nov 27 '12 at 13:22
1

I have an idea that's based on @joscas's answer. You can create a view like this:

key         -> value
---------------------
("math", 0) -> 2
("math", 1) -> 3
("math", 2) -> 5
....
("math", 100) -> 50

I made the assumption that the score range for each course is from 0 to 100. The idea is that:

  • For each course, you aggregate the number of students whose score falls in that score bucket (your have 101 buckets for each course, namely from bucket0 to bucket100).
  • Then you either use a chained view like @joscas pointed out, or use another external program to get a cumulative distribution of the score histograms like below.

key         -> accumulative value
------------------------------------
("math", 0) -> 2
("math", 1) -> 5
("math", 2) -> 10
....
("math", 99) -> 32324
("math", 100)-> 32374

Given a course name c and your score s, this second view tells you how many students play not as good as you in this course, from which you can derive the rank by using n-#s, where n is the total student number enrolled in c, #s is the number of students who get lower score than s. For instance, a query of "math", 99 will return 32374-32324 = 50 which is the rank of the student who got 99 in "math".

For the total score part of your questions, you can use the similar idea but change the bucket size and number.

dvail
  • 381
  • 2
  • 12