1

So Vespa is so cool and useful but it is really hard to grasp :/ For example i have now containerized it on k8 and have deployed my application. (config and search/storage nodes in one pod/node the node distro is googles own container optimized OS). I can for some minutes PUT docs and query docs via the REST/document API..amazing! But then after some minutes it simply goes into some error state.

I have to redeploy the app and active it again to start working.

Exec into the pod/container and vespa-logfmt -l all -s fmttime,service,message

says (i have no idea what is going on here but something goes into failure state). Why is is update the configserver and redeploy all apps etc ? Any other log i can assist with to debug what is happening ?

    2018-02-11 23:35:05] jdisc/configserver FrameworkEvent PACKAGES REFRESHED
[2018-02-11 23:35:08] configproxy      Timed out (timeout 15000) getting config name=sentinel,namespace=cloud.config,configId=hosts/vespa-0.vespa.default.svc.cluster.local, will retry
[2018-02-11 23:35:08] configserver     setting up simple metrics gathering. reportPeriodSeconds=60, pointsToKeepPerMetric=100
[2018-02-11 23:35:08] configserver     Using jute max buffer size 52428800
[2018-02-11 23:35:08] configserver     count/1 name=configserver.requests value=0
[2018-02-11 23:35:08] configserver     count/1 name=configserver.failedRequests value=0
[2018-02-11 23:35:08] configserver     count/1 name=procTime value=0
[2018-02-11 23:35:08] configserver     Adding user include dir 'config-models'
[2018-02-11 23:35:09] configserver     Creating all tenants
[2018-02-11 23:35:09] configserver     Using jute max buffer size 10485760
[2018-02-11 23:35:09] configserver     All tenants created
[2018-02-11 23:34:54] jdisc/configserver BundleEvent INSTALLED
[2018-02-11 23:35:09] configserver     Running in an OSGi environment
[2018-02-11 23:35:10] configserver     Configserver upgraded from 0.0.0 to 6.199.0. Redeploying all applications
[2018-02-11 23:35:10] configserver     All applications redeployed
[2018-02-11 23:35:10] configserver     Changing health status code from 'initializing' to 'up'
[2018-02-11 23:35:10] configserver     Rpc server listening on port 19070
[2018-02-11 23:35:10] configserver     Logging initialized @21152ms to org.eclipse.jetty.util.log.Slf4jLog
[2018-02-11 23:35:11] configserver     Creating janitor executor with 1 threads
[2018-02-11 23:35:11] configserver     jetty-9.4.8.v20171121, build timestamp: 2017-11-21T21:27:37Z, git hash: 82b8fb23f757335bb3329d540ce37a2a2615f0a8
[2018-02-11 23:35:12] config-sentinel  Connection to tcp/localhost:19090 failed or timed out
[2018-02-11 23:35:12] config-sentinel  FRT Connection tcp/localhost:19090 suspended until 2018-02-11 23:35:22 GMT
[2018-02-11 23:35:12] config-sentinel  Error response or no response from config server (key: name=sentinel,namespace=cloud.config,configId=hosts/vespa-0.vespa.default.svc.cluster.local) (errcode=103, vali
dresponse:0), trying again in 6000 milliseconds
[2018-02-11 23:35:12] configserver     Initiating Jersey application, version Jersey: 2.23.2 2016-08-08 17:14:55...
[2018-02-11 23:35:13] configserver     Selected ExecutorServiceProvider implementation [org.glassfish.jersey.server.internal.process.ServerProcessingBinder$DefaultManagedAsyncExecutorProvider] to be used f
or injection of executor qualified by [org.glassfish.jersey.server.ManagedAsyncExecutor] annotation.
[2018-02-11 23:35:13] configserver     Selected ScheduledExecutorServiceProvider implementation [org.glassfish.jersey.server.internal.process.ServerProcessingBinder$DefaultBackgroundSchedulerProvider] to b
e used for injection of scheduler qualified by [org.glassfish.jersey.server.BackgroundScheduler] annotation.
[2018-02-11 23:35:13] configserver     Jersey application initialized.\n\nGlobal Reader Interceptors:\n   org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor\nGlobal Writer Interceptor
s:\n   org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor\n   org.glassfish.jersey.server.internal.JsonWithPaddingInterceptor\nMessage Body Readers:\n   org.glassfish.jersey.media.mul
tipart.internal.MultiPartReaderServerSide\n   com.fasterxml.jackson.jaxrs.json.JacksonJaxbJsonProvider\nMessage Body Writers:\n   org.glassfish.jersey.media.multipart.internal.MultiPartWriter\n   com.faste
rxml.jackson.jaxrs.json.JacksonJaxbJsonProvider\n
[2018-02-11 23:35:13] configserver     Initiating Jersey application, version Jersey: 2.23.2 2016-08-08 17:14:55...
[2018-02-11 23:35:13] configserver     Selected ExecutorServiceProvider implementation [org.glassfish.jersey.server.internal.process.ServerProcessingBinder$DefaultManagedAsyncExecutorProvider] to be used f
or injection of executor qualified by [org.glassfish.jersey.server.ManagedAsyncExecutor] annotation.
[2018-02-11 23:35:13] configserver     Selected ScheduledExecutorServiceProvider implementation [org.glassfish.jersey.server.internal.process.ServerProcessingBinder$DefaultBackgroundSchedulerProvider] to b
e used for injection of scheduler qualified by [org.glassfish.jersey.server.BackgroundScheduler] annotation.
[2018-02-11 23:35:13] configserver     Jersey application initialized.\nRoot Resource Classes:\n  com.yahoo.vespa.serviceview.StateResource\nGlobal Reader Interceptors:\n   org.glassfish.jersey.server.inte
rnal.MappableExceptionWrapperInterceptor\nGlobal Writer Interceptors:\n   org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor\n   org.glassfish.jersey.server.internal.JsonWithPaddingIn
terceptor\nMessage Body Readers:\n   org.glassfish.jersey.media.multipart.internal.MultiPartReaderServerSide\n   com.fasterxml.jackson.jaxrs.json.JacksonJaxbJsonProvider\nMessage Body Writers:\n   org.glas
sfish.jersey.media.multipart.internal.MultiPartWriter\n   com.fasterxml.jackson.jaxrs.json.JacksonJaxbJsonProvider\n
[2018-02-11 23:35:13] configserver     Initiating Jersey application, version Jersey: 2.23.2 2016-08-08 17:14:55...
[2018-02-11 23:35:13] configserver     Selected ExecutorServiceProvider implementation [org.glassfish.jersey.server.internal.process.ServerProcessingBinder$DefaultManagedAsyncExecutorProvider] to be used f
or injection of executor qualified by [org.glassfish.jersey.server.ManagedAsyncExecutor] annotation.
[2018-02-11 23:35:13] configserver     Selected ScheduledExecutorServiceProvider implementation [org.glassfish.jersey.server.internal.process.ServerProcessingBinder$DefaultBackgroundSchedulerProvider] to b
e used for injection of scheduler qualified by [org.glassfish.jersey.server.BackgroundScheduler] annotation.
[2018-02-11 23:35:13] configserver     Jersey application initialized.\nRoot Resource Classes:\n  com.yahoo.vespa.orchestrator.resources.InstanceResource\n  com.yahoo.vespa.orchestrator.resources.Applicati
onSuspensionResource\n  com.yahoo.vespa.orchestrator.resources.HostResource\n  com.yahoo.vespa.orchestrator.resources.HostSuspensionResource\nGlobal Reader Interceptors:\n   org.glassfish.jersey.server.int
ernal.MappableExceptionWrapperInterceptor\nGlobal Writer Interceptors:\n   org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor\n   org.glassfish.jersey.server.internal.JsonWithPaddingI
nterceptor\nMessage Body Readers:\n   org.glassfish.jersey.media.multipart.internal.MultiPartReaderServerSide\n   com.fasterxml.jackson.jaxrs.json.JacksonJaxbJsonProvider\nMessage Body Writers:\n   org.gla
ssfish.jersey.media.multipart.internal.MultiPartWriter\n   com.fasterxml.jackson.jaxrs.json.JacksonJaxbJsonProvider\n
[2018-02-11 23:35:13] configserver     The following hints have been detected: HINT: A HTTP GET method, public void com.yahoo.vespa.orchestrator.resources.ApplicationSuspensionResource.getApplication(java.
lang.String), returns a void type. It can be intentional and perfectly fine, but it is a little uncommon that GET method returns always "204 No Content".\n
[2018-02-11 23:35:13] configserver     Started o.e.j.s.ServletContextHandler@10ab976b{/,null,AVAILABLE}
[2018-02-11 23:35:13] configserver     Using channel set by activator: sun.nio.ch.ServerSocketChannelImpl[/0:0:0:0:0:0:0:0:19071]
[2018-02-11 23:35:13] configserver     Started configserver@34d1f40e{HTTP/1.1,[http/1.1]}{0.0.0.0:19071}
[2018-02-11 23:35:13] configserver     Started @24855ms
[2018-02-11 23:35:13] configserver     Switching to the latest deployed set of configurations and components. Application switch number: 0
[2018-02-11 23:35:15] configproxy      Request callback failed: APPLICATION_NOT_LOADED. Connection spec: tcp/localhost:19070, error message: Failed request (No application exists) from Connection { Socket[
addr=/127.0.0.1,port=33410,localport=19070] }
[2018-02-11 23:35:36] configproxy      Subscribe for 'name=sentinel,namespace=cloud.config,configId=hosts/vespa-0.vespa.default.svc.cluster.local,0944a8c189a502c0e2fe1930114897b7' failed, closing subscribe
r
Lundin
  • 301
  • 4
  • 14
  • Could you please post your config files for the services as well ? – Arnstein Ressem Feb 12 '18 at 08:11
  • Thanks, do you mean the app services.xml config file ? It is a copy of the blog-search application with another content id and type. Watching the pod, it is restarted over and over again. Removed the readinesprobe endpoint still pickup some failure and restarts the container. Something with the config server fails. I also checked that i have search.config.qr-start.def in place and not overwritten. – Lundin Feb 12 '18 at 11:16
  • Sorry about being unclear about this. I meant the configuration files for the k8 services (the .yml files). – Arnstein Ressem Feb 12 '18 at 11:19
  • Alright =) See if this works: https://gist.github.com/lundin/bb05c0dbac9ff55a0b14f288745ed14c Note that i have added a nodeselector to land the pod on the right node (enough memory). Other than that should be straightforwad – Lundin Feb 12 '18 at 12:11

1 Answers1

3

After i shutdown the node and started a new one with more memory, n1-standard-2, it works all the time. K8 is getting a bit of overhead as well so lower was not to be used.

Vespa.ai is amazing! It deserves way more credits, i mean it is a dream for any application developer having a "general" db, search, geospatial, ranking and scalable out of box..without any coding. Many are struggling to get 1 of these right =)

Lundin
  • 301
  • 4
  • 14