Handling Incredibly large JSON Document in CouchDB

Question

I'm new to NoSql databases and I'm having a hard time figuring how to handle a very large JSON Document that could amount to over 20MB on my local drive. This structure will definitely increase over time and I worry about the speed of queries and having to search deep though the returned JSON object nest just to get a string out. My JSON is deeply nested like so for example.

{
"exams": {
    "exam1": {
        "year": {
            "math": {
                "questions": [
                    {
                        "question_text": "first question",
                        "options": [
                            "option1",
                            "option2",
                            "option3",
                            "option4",
                            "option5"
                        ],
                        "answer": 1,
                        "explaination": "explain the answer"
                    },
                    {
                         "question_text": "second question",
                        "options": [
                            "option1",
                            "option2",
                            "option3",
                            "option4",
                            "option5"
                        ],
                        "answer": 1,
                        "explaination": "explain the answer"
                    },
                    {
                        "question_text": "third question",
                        "options": [
                            "option1",
                            "option2",
                            "option3",
                            "option4",
                            "option5"
                        ],
                        "answer": 1,
                        "explaination": "explain the answer"
                    }
                ]
            },
            "english": {same structure as above}
        },
        "1961": {}
    },
    "exam2": {},
    "exam3": {},
    "exam4": {}
}
}

In the main application, question objects are created and appended based on type of exam, year, and subject making the JSON document huge over time. How can I re-model this so as to avoid slow queries in the future?

Sounds like you need to create more documents, rather than constantly extending the same one. — Dominic Barnes, Oct 14 '13 at 14:51
Thanks Dominic, I know that is what i should do as with SQL databases but how to "Normalize" and relate the sub-documents is a bit of a puzzle to me. — shanks, Oct 15 '13 at 07:22

score 0 · Accepted Answer · answered Oct 14 '13 at 15:45

0

Dominic is right. You need to start dividing the documents and storing them as separate documents.

The next question is how to recompose the document after it's been split.

Considering you're using Couch, I would recommend doing this at the application layer. A good starting point would be to create exam documents and store them in their own database. Then have a document (exams) in another database that has pointers to the exam documents.

You can retrieve the exams document and get exams one by one as needed. This could be especially useful with paging since most people will only want to see the most recent exams.

answered Oct 14 '13 at 15:45

ryan1234

7,237
6
25
36

Thanks Ryan. I think i was going about it the wrong way initially. I have split the large json into small question documents, with fields such as exam, year, date and qtext, options array, answer and explanation . The only concern is that as i currently have 6000 questions, then it means i will be making 6000 documents which feels to me coming fresh form an SQL DB World as making 6000 tables :( – shanks Oct 15 '13 at 07:50
2

That's not necessarily a problem. Bear in mind that NoSQL is going to be radically different from SQL most of the time. CouchDB is very relaxed, and chances are you can make just about any structure work. You just need to practice and experiment until you find the right balance. – Dominic Barnes Oct 15 '13 at 15:02

Handling Incredibly large JSON Document in CouchDB

1 Answers1