Frequent GC on docprocservice and container

Question

I'm running a performance test against vespa, the container looks slow to be unable to process the incoming more requests. Looked at vespa.log, there're lots of GC allocation failure logs. However, the system resources are pretty low (CPU<30%, mem<35%). Is there any configuration to optimize?

Btw, looks like the docprocservice is running on content node by default, how to tune jvmargs for docprocservice?

1523361302.261056        24298   container       stdout  info    [GC (Allocation Failure)  3681916K->319796K(7969216K), 0.0521448 secs]
1523361302.772183        24301   docprocservice  stdout  info    [GC (Allocation Failure)  729622K->100400K(1494272K), 0.0058702 secs]
1523361306.478681        24301   docprocservice  stdout  info    [GC (Allocation Failure)  729648K->99337K(1494272K), 0.0071413 secs]
1523361308.275909        24298   container       stdout  info    [GC (Allocation Failure)  3675316K->325043K(7969216K), 0.0669859 secs]
1523361309.798619        24301   docprocservice  stdout  info    [GC (Allocation Failure)  728585K->100538K(1494272K), 0.0060528 secs]
1523361313.530767        24301   docprocservice  stdout  info    [GC (Allocation Failure)  729786K->100561K(1494272K), 0.0088941 secs]
1523361314.549254        24298   container       stdout  info    [GC (Allocation Failure)  3680563K->330211K(7969216K), 0.0531680 secs]
1523361317.571889        24301   docprocservice  stdout  info    [GC (Allocation Failure)  729809K->100551K(1494272K), 0.0062653 secs]
1523361320.736348        24298   container       stdout  info    [GC (Allocation Failure)  3685729K->316908K(7969216K), 0.0595787 secs]
1523361320.839502        24301   docprocservice  stdout  info    [GC (Allocation Failure)  729799K->99311K(1494272K), 0.0069882 secs]
1523361324.948995        24301   docprocservice  stdout  info    [GC (Allocation Failure)  728559K->99139K(1494272K), 0.0127939 secs]

services.xml:
<container id="container" version="1.0">                                                                                               
    <config name="container.handler.threadpool">                                                                                         
        <maxthreads>10000</maxthreads>                                                                                                   
    </config>                                                                                                                            

    <config name="config.docproc.docproc">                                                                                               
      <numthreads>500</numthreads>                                                                                                      
    </config>                                                                                                                            

    <config name="search.config.qr-start">                                                                                               
      <jvm>                                                                                                                              
        <heapSizeAsPercentageOfPhysicalMemory>60</heapSizeAsPercentageOfPhysicalMemory>                                                  
      </jvm>                                                                                                                             
    </config>                                                                                                                            
    <document-api />                                                                                                                     

    <search>                                                                                                                             
        <provider id="music" cluster="music" cachesize="64M" type="local" />                                                           
    </search>                                                                                                                            

    <nodes>                                                                                                                              
      <node hostalias="admin0" />                                                                                                        
      <node hostalias="node2" />                                                                                                         
    </nodes>                                                                                                                             
  </container>

# free -lh
              total        used        free      shared  buff/cache   available
Mem:           125G         43G         18G        177M         63G         80G
Low:           125G        106G         18G
High:            0B          0B          0B
Swap:            0B          0B          0B

score 2 · Accepted Answer · answered Apr 11 '18 at 07:26

2

Those GC messages are coming from the jvm and are normal and not real failures. It's just the way the JVM works, collecting garbage that the application creates and all those are minor collections in from the young generation. If you start seeing Full GC messages tuning would be required.

The 'docprocservice' is not involved in search serving either so you can safely ignore those for a serving test. Most likely your bottleneck is the underlaying content layer. What is the resource usage like there? Regardless, running with 10K maxthreads seems excessive, the default 500 is more than enough - what kind of benchmarking client are you using?

answered Apr 11 '18 at 07:26

Jo Kristian Bergum

2,984
5
8

The system utilization on content node is pretty low (CPU<30%, mem<35%). I defined attribute for all the fields in sd file to avoid any disk io. btw, I'm seeing docprocserive start with -Xmx1536m, any way to change that? Thanks. – user221074 Apr 11 '18 at 07:42
interesting, I changed all the '1536m' configuration, restarting docprocservices still show up with 1536m! Is it hardcoded? – user221074 Apr 11 '18 at 07:56
The 'docprocservice' is a hidden service which the Vespa model derives from your configuration. It's runs tokenization/indexing so it's not straight forward to change the JVM setttings, you can avoid this by telling Vespa that you want to run the the indexing processing chain inside your explicit configured container. See http://docs.vespa.ai/documentation/reference/services-content.html#document-processing. By adding – Jo Kristian Bergum Apr 11 '18 at 08:00
The jvm settings for the hidden docprocservice is hard-coded yes. – Jo Kristian Bergum Apr 11 '18 at 09:20

score 1 · Answer 2 · answered Apr 11 '18 at 10:03

Generally it's easier to help if you provide

The setup and HW configuration (e.g services.xml and document schema)
What type of queries/ranking profile are in use, field searched etc. Total number of documents and if you use a custom ranking profile how does the result compare with using the built-in 'unranked' ranking profile.
Average number of hits returned (&hits=x) parameter and average total hits
Resource usage (e.g vmstat/top/network util from container(s) and content node(s) when the latency starts climbing past your targeted latency SLA (bottleneck reached/max throughput)
Same as above but with only one client (no concurrency). If you are past your targeted latency SLA/expectation already with no concurrency you might have to review the features in use (Examples would be adding rank:filter to unranked fields, adding fast-search to attributes involved in the query and so on)
Benchmarking client used (e.g number of connections and parameters used). We usually use the vespa-fbench tool.

Some general resources on Benchmarking & Profiling Vespa

Benchmarking Vespa (including our own benchmark client using persistent connections, if you benchmark using none-persistent connections you might end up benchmarking the OS's ability to maintain the tcp connections) http://docs.vespa.ai/documentation/performance/vespa-benchmarking.html
Profiling & Sizing http://docs.vespa.ai/documentation/performance/
Feature tuning http://docs.vespa.ai/documentation/performance/feature-tuning.html
Scaling Vespa http://docs.vespa.ai/documentation/performance/sizing-search.html This has some interesting graphs (e.g the expected relationship between overall latency & total hits and the expected latency break down when saturation has been reached).

Frequent GC on docprocservice and container

2 Answers2