1

I'm using the SeleniumGrid in the most recent version 4.1.2 in a Kubernetes cluster.

In many cases (I would say in about half) when I execute a test through the grid, the node fails to kill the processes and does not go back to being idle. The container then keeps using one full CPU all the time until I kill it manually.

The log in the container is the following:

10:51:34.781 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
10:51:35.680 INFO [NodeServer.lambda$createHandlers$2] - Node has been added
Starting ChromeDriver 98.0.4758.102 (273bf7ac8c909cde36982d27f66f3c70846a3718-refs/branch-heads/4758@{#1151}) on port 39592
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
[1C6h4r6o1m2e9D1r2i3v.e9r8 7w]a[sS EsVtEaRrEt]e:d  bsiuncdc(e)s sffauillleyd.:
 Cannot assign requested address (99)
11:08:24.970 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "99100300a4e6b4fe2afe5891b50def09","eventTime": 1646129304968456597,"eventName": "No slot matched the requested capabilities. ","attributes"

11:08:44.672 INFO [OsProcess.destroy] - Unable to drain process streams. Ignoring but the exception being swallowed follows.
org.apache.commons.exec.ExecuteException: The stop timeout of 2000 ms was exceeded (Exit value: -559038737)
    at org.apache.commons.exec.PumpStreamHandler.stopThread(PumpStreamHandler.java:295)
    at org.apache.commons.exec.PumpStreamHandler.stop(PumpStreamHandler.java:180)
    at org.openqa.selenium.os.OsProcess.destroy(OsProcess.java:135)
    at org.openqa.selenium.os.CommandLine.destroy(CommandLine.java:152)
    at org.openqa.selenium.remote.service.DriverService.stop(DriverService.java:281)
    at org.openqa.selenium.grid.node.config.DriverServiceSessionFactory.apply(DriverServiceSessionFactory.java:183)
    at org.openqa.selenium.grid.node.config.DriverServiceSessionFactory.apply(DriverServiceSessionFactory.java:65)
    at org.openqa.selenium.grid.node.local.SessionSlot.apply(SessionSlot.java:143)
    at org.openqa.selenium.grid.node.local.LocalNode.newSession(LocalNode.java:314)
    at org.openqa.selenium.grid.node.NewNodeSession.execute(NewNodeSession.java:52)
    at org.openqa.selenium.remote.http.Route$TemplatizedRoute.handle(Route.java:192)
    at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
    at org.openqa.selenium.grid.security.RequiresSecretFilter.lambda$apply$0(RequiresSecretFilter.java:64)
    at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)
    at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
    at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
    at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
    at org.openqa.selenium.grid.node.Node.execute(Node.java:240)
    at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
    at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
    at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
    at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
    at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
    at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
    at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
    at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
11:08:44.673 ERROR [OsProcess.destroy] - Unable to kill process Process[pid=75, exitValue=143]
11:08:44.675 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "99100300a4e6b4fe2afe5891b50def09","eventTime": 1646129316638154262,"eventName": "exception","attributes": {"driver.url": "http:\u002f\u002f

Here's an excerpt from the Kubernetes manifest:

        - name: selenium-node-chrome
          image: selenium/node-chrome:latest
...
          env:
            - name: TZ
              value: Europe/Berlin
            - name: START_XVFB
              value: "false"
            - name: SE_NODE_OVERRIDE_MAX_SESSIONS
              value: "true"
            - name: SE_NODE_MAX_SESSIONS
              value: "1"
          envFrom:
            - configMapRef:
                name: selenium-event-bus-config
...
          volumeMounts:
            - name: dshm
              mountPath: /dev/shm
...
      volumes:
        - name: dshm
          emptyDir:
            medium: Memory

The selenium-event-bus-config contains the following vars:

data:
  SE_EVENT_BUS_HOST: selenium-hub
  SE_EVENT_BUS_PUBLISH_PORT: "4442"
  SE_EVENT_BUS_SUBSCRIBE_PORT: "4443"

Did I misconfigure anything? Has anyone any idea how I can fix this?

Max N.
  • 993
  • 1
  • 12
  • 31
  • Which version of Kubernetes did you use and how did you set up the cluster? Did you use bare metal installation or some cloud provider? It is important to reproduce your problem.Could you attach full yamls to your question? – Mikołaj Głodziak Mar 02 '22 at 09:15
  • Running the grid on a bare-metal cluster K8s 1.19.8. I created the grid with the (pretty new) Helm chart here: https://github.com/SeleniumHQ/docker-selenium/tree/trunk/chart/selenium-grid – Max N. Mar 02 '22 at 16:43
  • Apparently the issue resolves when removing the `START_XVFB` parameter. With a node with only the timezone config I did not yet have the problem. – Max N. Mar 02 '22 at 16:44
  • You are using deprecated version of Kubernetes. Could you update it to minimal 1.21? Is your problem now resolved or do you want to find a solution with all your parameters? – Mikołaj Głodziak Mar 03 '22 at 13:34
  • I will update soon. For now this fixes the issue for me. – Max N. Mar 07 '22 at 09:44

1 Answers1

1

If you don't need to use Xvfb you can remove it from your code and your problem will be resolved.

Apparently the issue resolves when removing the START_XVFB parameter. With a node with only the timezone config I did not yet have the problem.

For the workaround you can try to change your driver for example to Chromedriver. You can read about the differences between them here.

See also this similar problem.

Mikołaj Głodziak
  • 4,775
  • 7
  • 28
  • 1
    Ok. Apparently I got a lot of stuff confused here. I don't know exactly why, but I thought when doing everything headless I do *not* need Xvfb. But of course I need it for exactly that! The failure described in the original question came when I started tests with the headless option against a node with `START_XVFB=false`. So no wonder this leads to issues. Thanks for clarifying. I will mark this as the answer and delete my other one. – Max N. Mar 08 '22 at 09:59