0

I reconfigured Spark infrastructure in K8s (as part of MLRun/Iguazio platform) and after that, I got a lot of issues in level of services:

  • Spark service (with information Failed)
  • All jupyter notebooks (with information Failed dependencies)

and also general error/message:

Some services have not been successfully deployed. Check the services status as shown below.

See the print screen enter image description here

I changed only amount of RAM (1-30 GB RAM), vCPU (1-14) and Replicas (3).

Did you get the similar issue and how to avoid the situation?

JIST
  • 1,139
  • 2
  • 8
  • 30

1 Answers1

0

It was human mistake, the solution was easy and the key problem was in Spark service configuration (I configured extremely small vCPU values and it generated timeouts for Spark service):

  • I used setting vCPU in the range 1-14 but I used default units millicpu (not cpu)
  • After setup correct units cpu and restart of Spark service, everything was fine.

Wrong setting enter image description here

Correct setting enter image description here

JIST
  • 1,139
  • 2
  • 8
  • 30