0

I have a data transformation problem that I'm struggling with:

Assume I have a set of question and answer records from polls loaded into mongo of the form

ID PollTitle Creator Question Answer Respondent

I want to transform these into what I think is a more compact JSON structure:

{
 ID,
 Poll Title,
 Questions : [
    { QuestionTitle, QuestionNumber, Answers : [ 
        { RespondentName, Answer, SubmittedTime }
        ]
    ]
}

this seems like it would be a better way to reduce the number of records and represent the natural hierarchy of Poll -> Questions -> Answers. Here is an example of some records for a single poll:

_id Poll ID Poll Title  Creator Question  Quest#  Responder Answer  Response Time
742888  9258    Daily Checkin   Mike    Was it a good meeting   1   John    Yes  8/16
742889  9258    Daily Checkin   Mike    Was it a good meeting   1   Len No   8/16
742890  9258    Daily Checkin   Mike    Do you agree with goal  2   John    Yes  8/16
742891  9258    Daily Checkin   Mike    Do you agree with goal  2   Len Yes  8/16

struggling to figure out how to do this in the query language using aggregation framework.

Roger Sanchez
  • 311
  • 1
  • 4
  • 13
  • I figured out how to do some of this using $addToSet and $group, however the result that I get back is a single document with an array of objects. How do I unwind this so that I get a separate record for each Poll/Question[]? – Roger Sanchez Aug 23 '12 at 00:16
  • Can you post an example of your documents prior to transforming? – Stennie Aug 23 '12 at 05:17
  • how big is the collection? it's possible that you will run into limitation of aggregation framework - it can return only maximum 16MB result set. – Asya Kamsky Aug 29 '12 at 01:43
  • The existing collection is just under 16MB. I can also remove a bunch of records as there is some noise in it. – Roger Sanchez Aug 29 '12 at 12:59

1 Answers1

0

You have to do more than one $group step in aggregation framework.

Here is what it would look like on your sample data and the output you would get:

db.poll.aggregate(
[
    {
        "$group" : {
            "_id" : {
                "ID" : "$ID",
                "title" : "$title",
                "QuestionTitle" : "$question",
                "QuestionNumber" : "$questionNum"
            },
            "answer" : {
                "$push" : {
                    "responder" : "$responder",
                    "Answer" : "$answer",
                    "respTime" : "$respTime"
                }
            }
        }
    },
    {
        "$group" : {
            "_id" : {
                "ID" : "$_id.ID",
                "title" : "$_id.title"
            },
            "Questions" : {
                "$push" : {
                    "QuestionTitle" : "$_id.QuestionTitle",
                    "QuestionNumber" : "$_id.questionNumber",
                    "Answers" : "$answer"
                }
            }
        }
    }
])
{
    "result" : [
        {
            "_id" : {
                "ID" : 9258,
                "title" : "Daily Checkin"
            },
            "Questions" : [
                {
                    "QuestionTitle" : "Do you agree with goal",
                    "Answers" : [
                        {
                            "responder" : "John",
                            "Answer" : "Yes",
                            "respTime" : "8/16"
                        },
                        {
                            "responder" : "Len",
                            "Answer" : "Yes",
                            "respTime" : "8/16"
                        }
                    ]
                },
                {
                    "QuestionTitle" : "Was it a good meeting",
                    "Answers" : [
                        {
                            "responder" : "John",
                            "Answer" : "Yes",
                            "respTime" : "8/16"
                        },
                        {
                            "responder" : "Len",
                            "Answer" : "No",
                            "respTime" : "8/16"
                        }
                    ]
                }
            ]
        }
    ],
    "ok" : 1
}

You can use $project at the end if you want to rename any of the fields or otherwise transform the exact format of the document.

Asya Kamsky
  • 41,784
  • 5
  • 109
  • 133
  • This is awesome, thanks Asya. I need to think about how this query works. One more related question - what is the best way to go from the results array of records back into a collection but with one record per object from results? If i try to save the results to a new collection I get only one entry with the results array .. want to unwind the results – Roger Sanchez Aug 29 '12 at 13:28
  • How are you saving the results? If you save r = db.coll.aggregate() then in the shell db.res.save(r.result) will save each element of the array as a separate document. That behavior is new in 2.2 though (just released). – Asya Kamsky Aug 29 '12 at 16:58
  • got it - I am using 2.1, will upgrade! – Roger Sanchez Aug 29 '12 at 17:32