Intercepting aggregation query in elastic search

Question

I am investigating elastic search now and I like to get some insights on the possibility of certain things. Any suggestions would be greatly appreciated.

I'm trying to tackle a very specific use case as follows:

I want to run a entitlement check on each row before doing the aggregation in elastic search? Is that possible?

It's like calling an external api to see whether the user has permission to do aggregation on a particular row, If yes, then it should be added to the aggregation resultset.

Example:

Lets say, I have some document data in elastic search, and each document has a specific tag attached. And I have some user data in another relation database with the below schema (userId, tag)

When user1 query elastic for the number of documents on the tag "es" it should return 2 whereas for user2 it should return 0 as the user don't have "es" tag attached.

It's like intercepting each and every call to the aggregation to do some customised check before increasing the count. Basically I'm looking to limit search results to things based on the user.

Schema and queries in elastic search

PUT /document
{
    "mappings": {
        "post": {
            "properties": {
                "document_id": {
                    "type":"integer"
                },
                "tag": {
                    "type":"string",
                    "index":"not_analyzed"
                },
                "document_name": {
                    "type":"string"
                }
            }
        }
    }
}


POST document/reports 
{
    "document_id":123,
    "tag":"es",
    "document_name":"elastic search indexing"
}

POST document/reports 
{
    "document_id":1233,
    "tag":"es",
    "document_name":"elastic search routing"
}


POST document/reports 
{
    "document_id":1234,
    "tag":"kafka",
    "document_name":"kafka partitioning"
}

Table structure in relation database

userId | tag            |
-------------------------
 user1 | es             |
 user2 | kafka          |

Search request query

GET document/reports/_search
{
    "query": {
        "match": {
            "_all": "es"
        }
    },
    "size": 0,
    "aggs": {
        "types": {
            "terms": {
                "field":"tag"
            }
        }
    }
}

Sample Response

{
   "took": 4,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "types": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "es",
               "doc_count": 2
            }
         ]
      }
   }
}

Thanks for your reply, I've added an example. Hope you see what I am trying to do. If its not clear please let me know. — Minisha, Feb 05 '17 at 06:34
Ok, is it conceivable to add another `user` field to your documents so you can easily express both conditions, i.e. `user=user1` AND `tag=es`? — Val, Feb 05 '17 at 06:36
No, its not possible. The user data is maintained by a completely separate application/system. Only through api's we can get the data of the user. — Minisha, Feb 05 '17 at 06:39
In sql terms, the query will be like select count(*), tag from reports group by tag. I am not well worst in elastic yet so have to figure out how to write this in elastic. — Minisha, Feb 05 '17 at 07:01
Since you control the SQL DB, why not including a constraint based on the user who makes the query? So if user1 makes a request, the query would look like `select count(*), tag from reports WHERE tag IN ('es') group by tag` — Val, Feb 05 '17 at 07:49
Have you tried to create a role based access to tags instead of users? or Should you do this with user tag relation table? On the other hand, @Val last question is a nice approach. — hkulekci, Feb 05 '17 at 09:49
My use case is much more complex than the example I described. That's y I was looking for intercepting the aggregation request. So as of now, there is no way to implement this functionality in elastic directly isn't ? — Minisha, Feb 05 '17 at 10:10
@hkulekci can you explain a little bit more on the role based access please — Minisha, Feb 05 '17 at 19:56
Intercepting the query would mean that for each document you need to make a remote call to some other system that decides whether the user has access to the document or not... Depending on the number of documents you have, this would be a performance killer. Imagine you have 1 Mio documents and it takes 1ms per call, you'd wait 20+ minutes for your request to return. I'm pretty sure that there's a way to do what you want solely based on cleverly crafted constraints, but we don't know enough about your use case, unfortunately. — Val, Feb 06 '17 at 05:25
Can I get you opinion on shield plugin for elastic search, looks like it has what I want but I didn't read fully yet. — Minisha, Feb 06 '17 at 10:18
Shield is definitely the ES plugin if you want to defer the authentication to ElasticSearch. Now, a less costly solution would just to filter on an apache server for this "es tag". See the shield documentation here: https://www.elastic.co/guide/en/shield/current/enable-basic-auth.html — Adonis, Mar 06 '17 at 16:43

Intercepting aggregation query in elastic search

0 Answers0