Spark requests for more core than asked when calling POST livy batch api in azure synapse

Question

I have an azure synapse spark cluster with 3 nodes of 4 vCores and 32 GB memory each. I am trying to submit a spark job using azure synapse Livy batch APIs. The request looks like this,

curl --location --request POST 'https://<synapse-workspace>.dev.azuresynapse.net/livyApi/versions/2019-11-01-preview/sparkPools/<pool-name>/batches?detailed=true' `
--header 'cache-control: no-cache' `
--header 'Authorization: Bearer <Token>' `
--header 'Content-Type: application/json' `
--data-raw '{
    "name": "T1",
    "file": "folder/file.py",
    "driverMemory": "1g",
    "driverCores": 1,
    "executorMemory": "1g",
    "executorCores":1,
    "numExecutors": 3
}'

The response I get is this,

{
    "TraceId": "<some-guid>",
    "Message": "Your Spark job requested 16 vcores. However, the pool has a 12 core limit. Try reducing the numbers of vcores requested or increasing your pool size."
}

I cannot figure out why is it asking for 16 cores. Shouldn't it ask for 4 (3 * 1 + 1) cores?

Update: I tried changing the node pool size to 3 nodes each of 8 vCores and 64 GB memory. And, with this configuration,

{
    "name": "T1",
    "file": "folder/file.py",
    "driverMemory": "1g",
    "driverCores": 1,
    "executorMemory": "1g",
    "executorCores": 1,
    "numExecutors": 6
}

It requests for 28 cores (even for executorCores 2,3,4). And if I change executorCores to 5,6,7 or 8, it will request for 56 cores.

Did you try submitting the spark job via the synapse portal? Does it work that way? — Jatin, Feb 11 '22 at 06:07
I hadn't tried that. But now that I have looked at it, many things are clear. I will post that as an answer. — aman_kumar, Feb 11 '22 at 10:06
When I tried submitting the job via the portal on 3 nodes each of 8 vCores and 56 GB memory, the UI tells me I have only two options for executor configurations. [1]: https://i.stack.imgur.com/mENdU.png It seems like a problem from the synapse side. So to conclude no matter how many resources I ask, it will always request maximum resources(4,8,16... cores). — aman_kumar, Feb 11 '22 at 12:32

Jatin · Answer 1 · 2022-02-11T12:22:37.927

0

From the portal there is no way to do what you are trying to do.

But you can still submit spark job by specifying driver (core and memory) and executor (core and memory). For example with something like this: Submit Spark job in Azure Synapse from Java

Using the above code, I am able to submit 9 concurrent jobs (with 1 driver and 1 executor, both consuming a single core) in 3 node Medium instances (8 cores each, though only 7 are available for use as 1 is reserved for hadoop daemon).

edited Feb 11 '22 at 12:22

answered Feb 11 '22 at 11:41

Jatin

31,116
15
98
163

When you say you were able to submit, do you mean you received the response or you actually checked the synapse workspace and in the monitor tab, you could see all 9 jobs running concurrently? In my case, I get the response but jobs are queued except one. – aman_kumar Feb 11 '22 at 12:30
Yes. I checked my synapse monitor tab and verified it. – Jatin Feb 11 '22 at 12:53

Spark requests for more core than asked when calling POST livy batch api in azure synapse

1 Answers1