RethinkDB: How to dynamically build merge queries?

Question

I am building an API using Python Flask. I love the ease for coding merge queries on RethinkDB. Even better, I've noticed that it may be possible to write a thin layer to code dynamic merge queries in regards to user input.

Let's say we are building a chat app. Here is the example code: https://github.com/dogukantufekci/rethinkdb_dynamic_merge

Tables and fields:

accounts: "id", "created_on", "name", "email", "password"
conversations: "id", "created_on", "subject", "to" (list of participants)
messages: "id", "created_on", "text", "conversation", "from"
message_readers: "id", "message", "reader"

Query to merge all 4 tables:

r.table("accounts").map(lambda account: 
    account.merge({
        "conversations": r.table("conversations").filter(lambda conversation: 
            conversation["to"].contains(account["id"])).coerce_to("array").map(lambda conversation:
            conversation.merge({
                "to": conversation["to"].map(lambda account: 
                    r.table("accounts").get(account)).coerce_to("array"),
                "messages": r.table("messages").filter(lambda message:
                    message["conversation"] == conversation["id"]).coerce_to("array").map(lambda message:
                    message.merge({
                        "from": r.table("accounts").get(message["from"]),
                        "readers": r.table("message_readers").filter(lambda readers:
                            readers["message"] == message["id"]).coerce_to("array"),
                    }))
            }))
    })).run(g.db_connection)

Result:

[{
    "id": "account111",
    "created_on": 1392515093.252,  
    "name": "John Doe",
    "email": "john@doe.com",
    "conversations": [
        {
            "id": "conversation111",
            "created_on": 1392515093.252,  
            "subject": "Merging Queries on RethinkDB",
            "to": [
                {
                    "id": "account111",
                    "created_on": 1392515093.252,  
                    "name": "John Doe", 
                    "email": "john@doe.com", 
                }, 
                {
                    "id": "account222",
                    "created_on": 1392515604.123,  
                    "name": "Mark Bobby", 
                    "email": "mark@bobby.com", 
                }, 
            ], 
            "messages": [
                {
                    "id": "message111",
                    "created_on": 1392515604.123,  
                    "text": "How do we dynamically build merge queries?", 
                    "conversation": "conversation111", 
                    "from": {
                        "id": "account111",
                        "created_on": 1392515093.252,  
                        "name": "John Doe",
                        "email": "john@doe.com",
                    }, 
                    "readers": [
                        {
                            "id": "message_reader111", 
                            "created_on": 1392515604.123, 
                            "message": "message111",
                            "reader": "account111",
                        }, 
                        {
                            "id": "message_reader222", 
                            "created_on": 1392515604.123, 
                            "message": "message111",
                            "reader": "account222",
                        },
                    ],
                },
            ], 
        }, 
    ],        
}]

Great so far!

A simpler response needs to return account data with conversations; no messages:

[{
    "id": "account111",
    "created_on": 1392515093.252,  
    "name": "John Doe",
    "email": "john@doe.com",
    "conversations": [
        {
            "id": "conversation111",
            "created_on": 1392515093.252,  
            "subject": "Merging Queries on RethinkDB",
            "to": [
                {
                    "id": "account111",
                    "created_on": 1392515093.252,  
                    "name": "John Doe", 
                    "email": "john@doe.com", 
                }, 
                {
                    "id": "account222",
                    "created_on": 1392515604.123,  
                    "name": "Mark Bobby", 
                    "email": "mark@bobby.com", 
                }, 
            ], 
        }, 
    ],        
}]

There are two ways to get this result:

We can re-write a query:

r.table("accounts").map(lambda account: 
    account.merge({
        "conversations": r.table("conversations").filter(lambda conversation: 
            conversation["to"].contains(account["id"])).coerce_to("array").map(lambda conversation:
            conversation.merge({
                "to": conversation["to"].map(lambda account: 
                    r.table("accounts").get(account)).coerce_to("array"),
            }))
    })).run(g.db_connection)

Disadvantage: If there's a need to create further queries for alternative field combinations, this is not the best practice, because it's not dynamic and there's a lot of repetition.

We can modify the last line of the large query with pluck to select fields:

})).pluck(["id", "created_on", "name", "email", {"conversations": ["id", "created_on", "subject", {"to": ["id", "created_on", "name", "email"]}]}]).run(g.db_connection)

Advantage: It is dynamic as it enables users to pluck value as an argument via URL

http://www.myproject.com/accounts/?pluck=["id", "created_on", "name", "email", {"conversations": ["id", "created_on", "subject", {"to": ["id", "created_on", "name", "email"]}]}]

Disadvantage: Query does go consume a lot of computational energy to merge tables which we dont need in the final result.

So the challenge is to dynamically build queries by accepting pluck value from user.

You can easily notice two conventions:

Each dict field has a query which accepts a dict object:

"messages": r.table("messages").filter(lambda message:
    message["conversation"] == conversation["id"]).coerce_to("array").map(lambda message:
        message.merge({})

Each non dict field has an independent query:

"from": r.table("accounts").get(message["from"])

So how can we use all these pieces information and build our nice dynamic merge queries?

Can you give an example with a dynamic merge query? (values of "fields" and the output) — neumino, Feb 18 '14 at 17:16
You can use some r.branch, but that is not going to make things better since the dynamic fields require different subqueries. — neumino, Feb 20 '14 at 20:00

score 1 · Answer 1 · answered Feb 19 '14 at 13:22

My suggestion is to give up the dynamic part. Instead, you should design the API in a RESTful way. That means, for your example, that if someone wants to access an account, they can send a GET request to /accounts/[identifier]. If you they want all the messages that an account sent or received, they send a GET request to /accounts/[identifier]/messages.

Also, why do you need a conversation object? I would change your db structure as follows:

accounts: "id", "created_on", "name", "email", "password" - Unchanged
conversations: - Removed
messages: "id", "created_on", "text", "sender", "receivers"
message_readers: - Removed

Thanks for the suggestion, but I am not the only one envisioning such dynamic queries for API services. Check out Facebook Graph API Explorer http://i.imgur.com/qiwsCQ4.png — Dogukan Tufekci, Feb 19 '14 at 13:32

RethinkDB: How to dynamically build merge queries?

1 Answers1