For (1):
Add the following dependency:
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-analytics-synapse-spark</artifactId>
<version>1.0.0-beta.4</version>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-identity</artifactId>
</dependency>
With below sample code:
import com.azure.analytics.synapse.spark.SparkBatchClient;
import com.azure.analytics.synapse.spark.SparkClientBuilder;
import com.azure.analytics.synapse.spark.models.SparkBatchJob;
import com.azure.analytics.synapse.spark.models.SparkBatchJobOptions;
import com.azure.identity.DefaultAzureCredentialBuilder;
import java.util.*;
public class SynapseService {
private final SparkBatchClient batchClient;
public SynapseService() {
batchClient = new SparkClientBuilder()
.endpoint("https://xxxx.dev.azuresynapse.net/")
.sparkPoolName("TestPool")
.credential(new DefaultAzureCredentialBuilder().build())
.buildSparkBatchClient();
}
public SparkBatchJob submitSparkJob(String name, String mainFile, String mainClass, List<String> arguments, List<String> jars) {
SparkBatchJobOptions options = new SparkBatchJobOptions()
.setName(name)
.setFile(mainFile)
.setClassName(mainClass)
.setArguments(arguments)
.setJars(jars)
.setExecutorCount(3)
.setExecutorCores(4)
.setDriverCores(4)
.setDriverMemory("6G")
.setExecutorMemory("6G");
return batchClient.createSparkBatchJob(options);
}
/**
* All possible Livy States: https://learn.microsoft.com/en-us/rest/api/synapse/data-plane/spark-batch/get-spark-batch-jobs#livystates
*
* Some of the values: busy, dead, error, idle, killed, not_Started, recovering, running, shutting_down, starting, success
* @param id
* @return
*/
public SparkBatchJob getSparkJob(int id, boolean detailed) {
return batchClient.getSparkBatchJob(id, detailed);
}
/**
* Cancels the ongoing synapse spark job
* @param jobId id of the synapse job
*/
public void cancelSparkJob(int jobId) {
batchClient.cancelSparkBatchJob(jobId);
}
}
And finally submit the spark-job:
SynapseService synapse = new SynapseService();
synapse.submitSparkJob("TestJob",
"abfss://builds@xxxx.dfs.core.windows.net/core/jars/main-module_2.12-1.0.jar",
"com.xx.Main",
Collections.emptyList(),
Arrays.asList("abfss://builds@xxxx.dfs.core.windows.net/core/jars/*"));
Finally, you will need to provide the necessary role in:
- Open Synapse Analytics Studio
- Manage -> Access Control
- Provide the role
Synapse Compute Operator
and Synapse Compute Operator
to the caller
To answer question-2:
When jobs are submitted in synapse via jars, they are equivalent to spark-submit
. So all the jobs are agnostic of each other and do not share each other's dependencies.