I'm trying to create a session in Apache Spark using the Livy rest API. It fails with the following error: User capacity has reached its maximum limit.
.
The user is running another spark job. I don't understand which capacity reached its maximum and how to fix it adjusting Spark conf parameters. Here is the log info that I think is relevant. I reformatted it to make it clearer:
22/05/30 19:18:51 INFO Client: Submitting application application_1653913029140_0247 to ResourceManager
22/05/30 19:18:51 INFO YarnClientImpl: Submitted application application_1653913029140_0247
22/05/30 19:18:51 INFO Client: Application report for application_1653913029140_0247 (state: ACCEPTED)
22/05/30 19:18:51 INFO Client:
client token: N/A
diagnostics: [Mon May 30 19:18:51 -0300 2022]
Application is Activated, waiting for resources to be assigned for AM. User capacity has reached its maximum limit.
Details : AM Partition = <DEFAULT_PARTITION> ;
Partition Resource = <memory:2662400, vCores:234> ;
Queue's Absolute capacity = 32.0 % ;
Queue's Absolute used capacity = 40.76923 % ;
Queue's Absolute max capacity = 100.0 % ;
Queue's capacity (absolute resource) = <memory:851967, vCores:74> ;
Queue's used capacity (absolute resource) = <memory:1085440, vCores:106> ;
Queue's max capacity (absolute resource) = <memory:2662400, vCores:234> ; "
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1653949131433
final status: UNDEFINED
tracking URL: http://vrt1557.bndes.net:8088/proxy/application_1653913029140_0247/
user: s-dtl-p01
22/05/30 19:18:51 INFO ShutdownHookManager: Shutdown hook called
The other running job has configured some spark parameters for high performance:
conf = {'spark.yarn.appMasterEnv.PYSPARK_PYTHON': 'python3',
'spark.cores.max': 50,
'spark.executor.memory': '10g',
'spark.executor.instances': 100,
'spark.driver.memory' : '10g'
}
The job that failed to start didn't configure any spark parameter and is using the cluster default values.
Sure I can tweak the spark parameters of the job that is running, so it won't prevent the allocation of resources for the new job, but I'd like to understand it. The queue configuration also has a lot of parameters that should be interacting with the application.
Which resource is exhausted? How do I discover it based in the log below?