Google App Engine (NDB): One-To-One and One-To-Many relationship

Question

I am building a web application (Django and google's NDB) and have to structure my models now. I have read many examples about how to implement One-To-Many but - to be honest - I'm even more confused right now after having read these.

Let me describe my problem with a similar example: I have users and each user can read multiple books. A user can stop reading at any time and the progress is saved. Even if a book has been finished, the progress will be saved and never deleted.

I need to check the progress of all books a user has started to read all the time, so this has to be efficient and should require as few db reads as possible. The amount of books is not too much (< 1000) and also the books are thin (say, only one chapter, title, author that's it). It's the mass of progress and the permanent lookup of the progress that I'm fearing since every user has his own progress to probably every book.

How can I structure my models best to these requirements? If I use the StructuredProperty in my User model will the size of the books that are refered to in Progress count towards the limit of X MB (hope not)? If not I guess something like this is the best way to go (I can read progress fast, without additional lookup, and if neccessary get the book from the db).

class Book(ndb.model):
    name = ndb.StringProperty(required=True)
    ...

class Progress(ndb.model):
    book = ndb.KeyProperty(kind="Book", required=True)
    last_page_read = ndb.IntegerProperty(required=True, default=0)
    ...

class User(ndb.model):
    name = ndb.StringProperty(required=True)
    books_and_progress = ndb.StructuredProperty(Progress, repeated=True)
    ...

Have you considered using the Django ORM? That is why it is there, to help with such matters. There are many good tutorials out there and it will make things much simpler in general. — Paul Collingwood, Sep 26 '14 at 12:29
You might consider using a different framework then as the ORM is sort of central. Perhaps Flask or similar. But just my 2c — Paul Collingwood, Sep 26 '14 at 12:40
If you really want to minimize reads then why not write all data to a single document. Then you can read any data back in a single read! — Matthew Franglen, Sep 26 '14 at 12:43
@MatthewFranglen: I don't think that's a good approach since the progress of the users will change constantly. — JustABit, Sep 26 '14 at 12:45
You want to use ndb but you want to perform joins. That is a bad idea - you will perform 1k lookups per request if you separate book progress from the user document. If you want joins you should use a relational database. So if you must stick with ndb, then go for a single document per user. While the progress of a given user will change, it will change less than they read the current progress (they must read it to start reading, and they may not start reading, so reads will exceed writes). I still think this is a bad idea but if you are going to do it then stop thinking that ndb is sql. — Matthew Franglen, Sep 26 '14 at 13:04
As per Matthew's comment, Django + GAE now support SQL so you can use that without worrying about non-relational issues. — Paul Collingwood, Sep 26 '14 at 13:10

score 1 · Accepted Answer · answered Sep 28 '14 at 18:29

Your approach is correct. As you're using structured properties, Progress instances are not separate datastore entities, they're stored inside the User entity, so no additional lookup is necessary to get progress information for a given user. Once you have the user you also have all the information about which books he's reading and in which page he left. To put it another way, your User entities will contain the user's name and a list of "book key, last_page_read" pairs.

will the size of the books that are refered to in Progress count towards the limit of X MB (hope not)?

Don't know which limit are you referring to, but keep in mind that what you actually have in the User entity is the key for the Book model, not the actual data for the book. So, the size of your Book instance doesn't affect when you're retrieving User instances from the datastore.

Thank you! I've migrated to Django ORM now as proposed by @PaulCollingwood and don't regret it. But I was still interested in the answer to my question, so thank you. — JustABit, Sep 29 '14 at 12:28

Google App Engine (NDB): One-To-One and One-To-Many relationship

1 Answers1