0

I have some data in Elastic that is something like this (translating it into a table format, which is how I picture it)

|Record id | userid | experiment condition | action taken |
|----------|--------|----------------------|--------------|
|1         |  1234  | B                    |              |
|2         |  1234  |                      |  search      |
|3         | 1234   |                      | click        |
|4         | 5678   | A                    |              |
|5         |5678    |                      | search       |
|6         |5678    |                      | search       |

and so on

What I want to be able to do is find all the instances where action_taken='search' and user is in experiment_condition='B'

That's pretty trivial in SQL-land, but I'm struggling to figure out how to do this in Elastic.

I've looked at parent/child queries here: https://www.elastic.co/guide/en/elasticsearch/guide/current/children-agg.html but it doesn't seem like that's what I want, but maybe I'm just having trouble wrapping my brain around it.

Thanks!

edit: here are the JSON records that are in ES

{
  "_index": "04-06-2017",
  "_type": "logs",
  "_id": "record_id_123",
  "_score": null,
  "_source": {
    "@version": "1",
    "@timestamp": "2017-04-06T03:59:42.000Z",
    "datetime": "05/Apr/2017:23:59:42 -0400",
    "duration": "05/Apr/2017:23:59:42 -0400",
    "sessionid": "session_hash_1234",
    "un": "user_id_1234",
    "ev": "login",
    "experimentGroup": "B"
  },
  "fields": {
    "@timestamp": [
      1491451182000
    ]
  }
},
{
  "_index": "04-06-2017",
  "_type": "logs",
  "_id": "record_id_567",
  "_score": null,
  "_source": {
    "@version": "1",
    "@timestamp": "2017-04-06T04:00:22.000Z",
    "datetime": "06/Apr/2017:00:00:22 -0400",
    "rc": "200",
    "pcd": "85",
    "duration": "06/Apr/2017:00:00:22 -0400",
    "un": "user_id_1234",
    "sessionid": "session_hash_1234",
    "rtid": "query_hash_1234",
    "js": "1",
    "rs": "1422x889",
    "cd": "16",
    "ln": "en",
    "tz": "GMT%20-04%3A00",
    "action": "toc",
    "event": "click",
    "node": "BUTTON",
    "ev": "search",
    "query":"query_terms_here"
  },
  "fields": {
    "@timestamp": [
      1491451222000
    ]
  },
}

What I want to do is be able to find all the records where ev='search' for all the users whose experimentGroup='B'

jrubins
  • 187
  • 13
  • This answer might help: http://stackoverflow.com/questions/36915428/how-to-setup-elasticsearch-index-structure-with-multiple-entity-bindings/36982705#36982705 (hint: denormalize) – Val Apr 06 '17 at 13:13
  • Thanks for the edit and comment @Val. If I'm reading your answer to the linked question correctly, you are suggesting to denormalize on the data side. I can't alter the data structure, which is already reasonably flat. I'll update my question with the elastic data structure. – jrubins Apr 06 '17 at 14:25

1 Answers1

0

So after fighting with this for a while, I decided to just do two queries. By using the elastic plugin for python, I could get the data I needed then issue a terms query which contained a list of all the users.

Get users:

blob=es.search(index='my-index', body={
  "size": 0,
  "query": {
    "query_string": {
      "query": "settings.experimentGroup:\"A\""
    }
  }
})

Then turn that into a list

users_in_group_a = json_normalize(blob['hits']['hits'])
unique_users_in_a=list(set(users_in_group_a['_id']))

then get the results for those users:

blob = es.search(index='my-index', body={
        "size":0,
  'query': {
          'filtered': {
                  'filter': {
                          'bool':{
                                  'must':[
                                          {'terms': { 'userid': [unique_users]}},
                                          {'range':{'@timestamp':{'gte':1489280900154,'lte':1491005332057}}}
                                          ]
                                  }
                          }
            }
})
jrubins
  • 187
  • 13