0

I am running OpenGL in headless mode with Xserver and invoking this api multiple times: https://github.com/RobotLocomotion/drake/blob/74292cacd1c42d6b3e682dc836254cdb834ea2e6/geometry/render/render_engine_vtk.cc#L311

Sporadically but almost always there is a

X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  154 (GLX)
  Minor opcode of failed request:  3 (X_GLXCreateContext)
  Value in failed request:  0x0
  Serial number of failed request:  61
  Current serial number in output stream:  62

glxinfo:

glxinfo
name of display: :0
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
      after 50 requests (50 known processed) with 0 events remaining.

Last lines of /var/log/Xorg.0.log logs:

[ 47757.261] (EE) Backtrace:
[ 47757.261] (EE) 0: /usr/lib/xorg/Xorg (xorg_backtrace+0x4d) [0x557e48dd2acd]
[ 47757.261] (EE) 1: /usr/lib/xorg/Xorg (0x557e48c1a000+0x1bc869) [0x557e48dd6869]
[ 47757.261] (EE) 2: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7f4cbddc7000+0x128a0) [0x7f4cbddd98a0]
[ 47757.261] (EE) 3: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (0x7f4cba768000+0x479100) [0x7f4cbabe1100] 
[ 47757.261] (EE) 
[ 47757.262] (EE) Segmentation fault at address 0x8
[ 47757.262] (EE) 
Fatal server error:
[ 47757.262] (EE) Caught signal 11 (Segmentation fault). Server aborting

Machine: 18.04.2-Ubuntu

NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2

Can someone please let me know what next to debug here?

am83
  • 23
  • 3

1 Answers1

0

I also see this in my own CI:

[ 18228.470] (EE) Backtrace:
[ 18228.470] (EE) 0: /usr/lib/xorg/Xorg (xorg_backtrace+0x4d) [0x55e0ca9fcacd]
[ 18228.470] (EE) 1: /usr/lib/xorg/Xorg (0x55e0ca844000+0x1bc869) [0x55e0caa00869]
[ 18228.470] (EE) 2: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7fce3e7d6000+0x128a0) [0x7fce3e7e88a0]
[ 18228.470] (EE) 3: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (0x7fce3b177000+0x479100) [0x7fce3b5f0100]
[ 18228.470] (EE) 
[ 18228.470] (EE) Segmentation fault at address 0x8

Different ASLR, but same low order bytes in the trace.

I'm using xorg-server 2:1.19.6-1ubuntu4.4.

[ 17925.887] (II) Module nvidia: vendor="NVIDIA Corporation"
[ 17925.887]    compiled for 1.6.99.901, module version = 1.0.0
[ 17925.887]    Module class: X.Org Video Driver
[ 17925.887] (II) NVIDIA dlloader X Driver  440.100  Fri May 29 08:21:27 UTC 2020

I have not yet been able to debug this, unfortunately.

My off-the-cuff experience (not yet substantiated by data) is that this got much, much more frequent when Ubuntu force-upgraded everyone from nvidia 430 to nvidia 440 a few months ago.

jwnimmer-tri
  • 1,994
  • 2
  • 5
  • 6