2

I am building an API using Python Flask. I love the ease for coding merge queries on RethinkDB. Even better, I've noticed that it may be possible to write a thin layer to code dynamic merge queries in regards to user input.

Let's say we are building a chat app. Here is the example code: https://github.com/dogukantufekci/rethinkdb_dynamic_merge

Tables and fields:

  1. accounts: "id", "created_on", "name", "email", "password"
  2. conversations: "id", "created_on", "subject", "to" (list of participants)
  3. messages: "id", "created_on", "text", "conversation", "from"
  4. message_readers: "id", "message", "reader"

Query to merge all 4 tables:

r.table("accounts").map(lambda account: 
    account.merge({
        "conversations": r.table("conversations").filter(lambda conversation: 
            conversation["to"].contains(account["id"])).coerce_to("array").map(lambda conversation:
            conversation.merge({
                "to": conversation["to"].map(lambda account: 
                    r.table("accounts").get(account)).coerce_to("array"),
                "messages": r.table("messages").filter(lambda message:
                    message["conversation"] == conversation["id"]).coerce_to("array").map(lambda message:
                    message.merge({
                        "from": r.table("accounts").get(message["from"]),
                        "readers": r.table("message_readers").filter(lambda readers:
                            readers["message"] == message["id"]).coerce_to("array"),
                    }))
            }))
    })).run(g.db_connection)

Result:

[{
    "id": "account111",
    "created_on": 1392515093.252,  
    "name": "John Doe",
    "email": "john@doe.com",
    "conversations": [
        {
            "id": "conversation111",
            "created_on": 1392515093.252,  
            "subject": "Merging Queries on RethinkDB",
            "to": [
                {
                    "id": "account111",
                    "created_on": 1392515093.252,  
                    "name": "John Doe", 
                    "email": "john@doe.com", 
                }, 
                {
                    "id": "account222",
                    "created_on": 1392515604.123,  
                    "name": "Mark Bobby", 
                    "email": "mark@bobby.com", 
                }, 
            ], 
            "messages": [
                {
                    "id": "message111",
                    "created_on": 1392515604.123,  
                    "text": "How do we dynamically build merge queries?", 
                    "conversation": "conversation111", 
                    "from": {
                        "id": "account111",
                        "created_on": 1392515093.252,  
                        "name": "John Doe",
                        "email": "john@doe.com",
                    }, 
                    "readers": [
                        {
                            "id": "message_reader111", 
                            "created_on": 1392515604.123, 
                            "message": "message111",
                            "reader": "account111",
                        }, 
                        {
                            "id": "message_reader222", 
                            "created_on": 1392515604.123, 
                            "message": "message111",
                            "reader": "account222",
                        },
                    ],
                },
            ], 
        }, 
    ],        
}]

Great so far!

A simpler response needs to return account data with conversations; no messages:

[{
    "id": "account111",
    "created_on": 1392515093.252,  
    "name": "John Doe",
    "email": "john@doe.com",
    "conversations": [
        {
            "id": "conversation111",
            "created_on": 1392515093.252,  
            "subject": "Merging Queries on RethinkDB",
            "to": [
                {
                    "id": "account111",
                    "created_on": 1392515093.252,  
                    "name": "John Doe", 
                    "email": "john@doe.com", 
                }, 
                {
                    "id": "account222",
                    "created_on": 1392515604.123,  
                    "name": "Mark Bobby", 
                    "email": "mark@bobby.com", 
                }, 
            ], 
        }, 
    ],        
}]

There are two ways to get this result:

  1. We can re-write a query:

    r.table("accounts").map(lambda account: 
        account.merge({
            "conversations": r.table("conversations").filter(lambda conversation: 
                conversation["to"].contains(account["id"])).coerce_to("array").map(lambda conversation:
                conversation.merge({
                    "to": conversation["to"].map(lambda account: 
                        r.table("accounts").get(account)).coerce_to("array"),
                }))
        })).run(g.db_connection)
    

    Disadvantage: If there's a need to create further queries for alternative field combinations, this is not the best practice, because it's not dynamic and there's a lot of repetition.

  2. We can modify the last line of the large query with pluck to select fields:

    })).pluck(["id", "created_on", "name", "email", {"conversations": ["id", "created_on", "subject", {"to": ["id", "created_on", "name", "email"]}]}]).run(g.db_connection)
    

    Advantage: It is dynamic as it enables users to pluck value as an argument via URL

    http://www.myproject.com/accounts/?pluck=["id", "created_on", "name", "email", {"conversations": ["id", "created_on", "subject", {"to": ["id", "created_on", "name", "email"]}]}]
    

    Disadvantage: Query does go consume a lot of computational energy to merge tables which we dont need in the final result.

So the challenge is to dynamically build queries by accepting pluck value from user.

You can easily notice two conventions:

  1. Each dict field has a query which accepts a dict object:

    "messages": r.table("messages").filter(lambda message:
        message["conversation"] == conversation["id"]).coerce_to("array").map(lambda message:
            message.merge({})
    
  2. Each non dict field has an independent query:

    "from": r.table("accounts").get(message["from"])
    

So how can we use all these pieces information and build our nice dynamic merge queries?

Dogukan Tufekci
  • 2,978
  • 3
  • 17
  • 21
  • Can you give an example with a dynamic merge query? (values of "fields" and the output) – neumino Feb 18 '14 at 17:16
  • You can use some r.branch, but that is not going to make things better since the dynamic fields require different subqueries. – neumino Feb 20 '14 at 20:00

1 Answers1

1

My suggestion is to give up the dynamic part. Instead, you should design the API in a RESTful way. That means, for your example, that if someone wants to access an account, they can send a GET request to /accounts/[identifier]. If you they want all the messages that an account sent or received, they send a GET request to /accounts/[identifier]/messages.

Also, why do you need a conversation object? I would change your db structure as follows:

  1. accounts: "id", "created_on", "name", "email", "password" - Unchanged
  2. conversations: - Removed
  3. messages: "id", "created_on", "text", "sender", "receivers"
  4. message_readers: - Removed
elbear
  • 769
  • 8
  • 16
  • Thanks for the suggestion, but I am not the only one envisioning such dynamic queries for API services. Check out Facebook Graph API Explorer http://i.imgur.com/qiwsCQ4.png – Dogukan Tufekci Feb 19 '14 at 13:32