How to get occurrence count of specific field value in elasticsearch from 650 M data

Question

I have indexed Twitter data in ES. There are 110 M Twitter unique users profiles and there 650 M Tweets. Both are in seperate index (index: twitter-profiles, type: profiles), for tweets (index: twitter-tweets, type: tweets).

There is user_id_str of profile is attached with every tweet.

I am running into a problem to get occurrence count of specific user. I used Facet/terms and Aggregation/Terms but both give me exception PartialShardFailureException because there are lot of data to make calculation. I used following query

{
"aggs" : {
    "userCount" : {
        "terms" : { "field" : "user_id_str" }
    }
  }
}

Then I give another Try.

I used second method Scan. Here I get ids of profiles from profiles type then search it in tweet type. it give me results but a single result came after 2seconds OOps. There are 110 M users mean I have to wait for days.

Please give me any reasonable solution for this situation.

What is the mapping? Did you use non_analyzed on the mentioned field? How many shards do you use? How many nodes? — Jettro Coenradie, Aug 10 '14 at 08:55
Yes the field I am looking for is not_analyzed, There are 6 shards and three nodes running on Amazon's Ec2 servers — Sohail Ahmed, Aug 11 '14 at 05:32
The scan is used to go through all the data with sorting or scoring. If you want to aggregate over all the users (110 M), there can be a memory problem. More shards and more nodes with more memory could be an option. Maybe try with a more limited dataset and see the results than. — Jettro Coenradie, Aug 11 '14 at 21:58

score -2 · Answer 1 · answered Aug 24 '14 at 16:30

-2

You could use Cardinality aggregation in combination with term filter

answered Aug 24 '14 at 16:30

wahhzu

1
1

How to get occurrence count of specific field value in elasticsearch from 650 M data

1 Answers1