How to make aggregations fast on Vespa?

Question

We have 60M documents in an index. hosted on 4 nodes cluster.

I want to make sure the configuration is optimised for aggregations on the documents.

This is the sample query:

select * from sources * where (sddocname contains ([{"implicitTransforms": false}]"tweet")) | all(group(n_tA_c) each(output(count() as(count))));

The field n_tA_c contains array of strings. This is the sample document:

        {
            "fields": {
                "add_gsOrd": 63829,
                "documentid": "id:firehose:tweet::815347045032742912",
                "foC": 467,
                "frC": 315,
                "g": 0,
                "ln": "en",
                "m": "ya just wants some fried rice",
                "mTp": 2,
                "n_c_p": [],
                "n_tA_c": [                        
                    "fried",
                    "rice"
                ],
                "n_tA_s": [],
                "n_tA_tC": [],
                "sN": "long_delaney1",
                "sT_dlC": 0,
                "sT_fC": 0,
                "sT_lAT": 0,
                "sT_qC": 0,
                "sT_r": 0.0,
                "sT_rC": 467,
                "sT_rpC": 0,
                "sT_rtC": 0,
                "sT_vC": 0,
                "sddocname": "tweet",
                "t": 1483228858608,
                "u": 377606303,
                "v": "false"
            },
            "id": "id:firehose:tweet::815347045032742912",
            "relevance": 0.0,
            "source": "content-root-cluster"
        }

The n_tA_c is attribute with mode fast-search

    field n_tA_c type array<string> {
        indexing: summary | attribute
        attribute: fast-search
    }

The simple term aggregation query does not come back in 20s. And times-out. What are additional check-list we need to ensure to reduce this latency?

$ curl 'http://localhost:8080/search/?yql=select%20*%20from%20sources%20*%20where%20(sddocname%20contains%20(%5B%7B%22implicitTransforms%22%3A%20false%7D%5D%22tweet%22))%20%7C%20all(group(n_tA_c)%20each(output(count()%20as(count))))%3B' | python -m json.tool
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100   270  100   270    0     0     13      0  0:00:20  0:00:20 --:--:--    67
    {
        "root": {
            "children": [
                {
                    "continuation": {
                        "this": ""
                    },
                    "id": "group:root:0",
                    "relevance": 1.0
                }
            ],
            "errors": [
                {
                    "code": 12,
                    "message": "Timeout while waiting for sc0.num0",
                    "source": "content-root-cluster",
                    "summary": "Timed out"
                }
            ],
            "fields": {
                "totalCount": 0
            },
            "id": "toplevel",
            "relevance": 1.0
        }
    }

These nodes are aws i3.4x large boxes.(16 cores, 120 GB)

I might me missing something silly.

You really want to get all unique values and their count? Your grouping expression does not limit number of groups by max() so so you get everything. — Jo Kristian Bergum, Oct 26 '17 at 16:44

Jo Kristian Bergum · Accepted Answer · 2017-10-26T19:58:29.523

6

You are asking for every unique value and their count() as your grouping expression does not contain any max(x) limitation, this is a very cpu and network intensive task to compute and limiting number of groups is much faster by e.g

all(group(n_tA_c) max(10) each(output(count() as(count))));

General comments: With vespa like any other serving engine it's important to have enough memory and e.g swap disabled so you can index and search data without getting into high memory pressure.

How much memory you'll use per document type is dependent on several factors but how many fields defined with attribute and number of documents per node is important. Redundancy and number of searchable copies also plays a major role.

Grouping over the entire corpus is memory intensive (memory bandwidth reading attribute values), cpu intensive and also network intensive when there is a high fan-out (See more on the precision here http://docs.vespa.ai/documentation/grouping.html which can limit number of groups returned per node).

edited Oct 26 '17 at 19:58

answered Oct 26 '17 at 16:52

Jo Kristian Bergum

2,984
5
8

Ah. sorry on missing that part. tried that. but still times out. with max(10) and max (1). added gist here: https://gist.github.com/yogin16/7fd57ab33b65fb50e24b3e26529d92ed – enator Oct 26 '17 at 17:06
The "ln" field in sample document represents language. (so, much less cardinality) - That is also attribute. when we try aggregation on them. with max(10) it also takes 5 secs on average. @jkb Is this latency expected? This is much lesser than ES. – enator Oct 26 '17 at 17:14
1

To reduce the complexity and make sure we are only considering grouping performance you can add &ranking=unranked and &timeout=30 to the request and also add limit 0 to the yql 'select * from sources * where (sddocname contains ([{"implicitTransforms": false}]"tweet")) limit 0 | all(group(n_tA_c) max(10) each(output(count() as(count))));' – Jo Kristian Bergum Oct 26 '17 at 17:31
timeout is 20 in services.xml. and with both unranked and limit 0 - still getting timeout! https://gist.github.com/yogin16/7ea39f90bbb6f61ef2b086b114c7c59c – enator Oct 26 '17 at 17:38
I see, is it just with grouping that you experience this behaviour? How long does a simple sddocname:tweet query without ranking and with e.g limit 10 take? It could look like you have deeper issues like high memory usage/swap based on these timings. Do content nodes share hosts with the java containers? What is the resource usage (memory especially?) – Jo Kristian Bergum Oct 26 '17 at 19:15
Doesn't look like memory issue. CPU spikes when the agg query is made. this is detailed report with "htop": https://github.com/yogin16/tweet-vespa-app/blob/master/cluster-detail.md Shared application at: https://github.com/yogin16/tweet-vespa-app – enator Oct 27 '17 at 07:29
1

Thanks @enator, will review your app and get back to you on this. – Jo Kristian Bergum Oct 27 '17 at 07:34
1

Yes, no memory pressure here, just proton chewing away on the data using only one cpu core which is not great for latency. I had a look at your setup and can increase parallelism here http://docs.vespa.ai/documentation/content/setup-proton-tuning.html#requestthreads-persearch, e.g to 4. In addition you should enable the groupingSessionCache http://docs.vespa.ai/documentation/reference/search-api-reference.html#groupingSessionCache described also in http://docs.vespa.ai/documentation/reference/grouping-syntax.html. – Jo Kristian Bergum Oct 27 '17 at 10:20
Thanks. I would try that. – enator Oct 28 '17 at 16:50

score 0 · Answer 2 · answered Nov 04 '17 at 07:02

Summarising the checkpoints to take care while making aggregations from the conversation in other answer and more documentation help.

Always add max(x) in the group for size of buckets needed. When data is distributed across multiple content nodes this result can be inaccurate. To increase accuracy we need to use precision(x) as well to tune accuracy as we need.
If you only need aggregation buckets and no hits - pass limit 0 in the yql; this will save the step to load summary to be returned for container.
The attribute fields we are filtering/aggregating to be on mode fast-search; otherwise it is not B-tree like index - and has to be traversed.
Ensure constant score for docs with &ranking=unranked in the query.
Enable groupingSessionCache: http://docs.vespa.ai/documentation/reference/search-api-reference.html#groupingSessionCache
Sizing the content node for tradeoffs of latency vs no. of docs. by max-hits as described: http://docs.vespa.ai/documentation/performance/sizing-search.html
If memory is the bottleneck one can look at attribute flush strategy configuration. http://docs.vespa.ai/documentation/proton.html#proton-maintenance-jobs
If CPU is the bottleneck; increase parallelism. Ensure all cores are used in Searcher. http://docs.vespa.ai/documentation/content/setup-proton-tuning.html#requestthreads-persearch. Changes for that in service.xml:

<persearch>16</persearch>

Threads persearch is by default 1.

Above changes, ensured that query is returned with result before timeout. But learned that Vespa is not made for aggregations with primary goal. The latency for write and search are much less than ES with same scale on identical hardware. But aggregation (specially with multi-valued string fields) is more CPU intensive and more latency compare to ES for the same aggregation query.

How to make aggregations fast on Vespa?

2 Answers2