Questions tagged [google-spark-operator]

Use this tag for questions related with the google spark operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

References:

https://github.com/GoogleCloudPlatform/spark-on-k8s-operator

20 questions
2
votes
1 answer

Airflow SparkKubernetesOperator logging

I am using KubernetesExecutor as a Executor in Airflow. My DAG code from datetime import datetime, timedelta from airflow import DAG from airflow.providers.cncf.kubernetes.operators.spark_kubernetes import SparkKubernetesOperator from…
Manju N
  • 886
  • 9
  • 14
2
votes
1 answer

Helm install spark-operator tries to download a version that does not exist and cannot be force to use the correct one

I am trying to install ANY working version of spark-operator https://console.cloud.google.com/gcr/images/spark-operator/GLOBAL/spark-operator?tag=v1beta2-1.3.1-3.1.1 on my local kubernetes. However, spark pod is stuck on ImagePullBackOff trying to…
2
votes
0 answers

spark-submit on Openshift to use specific Worker nodes

I am trying to , spark-submit on Openshift to use specific Worker nodes. below is my command. ./spark/bin/spark-submit \ --master xx:6443 \ --deploy-mode cluster \ --name \ --class com.xxx \ --conf spark.executor.instances=2 \ --conf…
2
votes
1 answer

Dockerfile for Spark/Java application to execute via Spark Operator

I am trying to run spark/java application on kubernetese (via minikube) using spark-operator. I am getting a bit confused on what should I place in the Dockerfile so that it could be built in the image format and execute via spark-operator ? Sample…
Praveenks
  • 1,436
  • 9
  • 40
  • 79
1
vote
1 answer

java.io.FileNotFoundException error in Apache Spark even though my file exists

I'm new to spark and doing on POC to download a file and then read it. However, I am facing issue that the file doesn't exists. java.io.FileNotFoundException: File file:/app/data-Feb-19-2023_131049.json does not exist But when I printed the path…
1
vote
0 answers

JMX exporter & spark-on-k8s-operator

I'm trying to submit a spark application using spark operator and to expose metrics using JMX exporter. I'm using Spark 3.1.1 & spark operator v1beta2-1.3.3-3.1.1 Here is a snippet from the configuration. monitoring: exposeDriverMetrics: true …
1
vote
1 answer

How to specify job timeout in Spark?

I have a spark job running on kubernetes using the spark-on-k8s-operator. This job usually takes less than 5 minutes to complete but sometimes I'm having a problem of job stuck because of executors lost that I'm still investigating. How can I…
1
vote
0 answers

Failed to connect to spark-master:7077

I am trying to deploy my spark application on Kubernetes. I followed the below steps: Installed spark-kubernetes-operator: helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator helm install gcp-spark-operator…
1
vote
0 answers

Spark driver/executor doesn't allow logging of Prometheus Jmx-exporter's fine logs that is should output as JMX-agent

The java process is called as: /usr/local/openjdk-8/bin/java -XX:+UseG1GC -Dlog4j.debug -Dlog4j.configuration=log4j.properties -Djava.util.logging.config.file=/etc/metrics/conf/logging.properties…
1
vote
0 answers

Spark on K8s: UnknownHostException when spark app is trying to resolve DNS (getting) of another pod in diffrent namespace on same cluster

I am able to execute SparkPi in k8s and deployed (in GKE) as well. But, when I am trying to broadcast PI value to my microservice which is in toys-broadcast-svc.toys.svc.cluster.local I am unable to resolve DNS (getting UnknownHostException) . Can…
0
votes
0 answers

Retrieving Password from Secrets in Kubernetes Spark Application through Spark Operator

I have a question related to kubernetes spark operator, am trying to pass the spark configurations through spec->sparkConf. I am not not able to find a way to get the password spark.cassandra.auth.password from secrets. Here is my sparkapplication…
0
votes
1 answer

Multiple spark-submit using spark operator on k8s

Is it possible to submit multiple spark-submit using a single spark operator on k8s? Or is a dedicated spark-operator required for each spark-submit?
0
votes
0 answers

PySpark Performance slow in Reading large fixed width file with long lines to convert to structural

I am trying to convert bit large file 34GB fixed width file into structural format using pySpark, But my job taking too long to complete (Almost 10 hr+), File having large line almost 50K characters which I am trying to split using substring into…
0
votes
1 answer

How can i set RestartPolicy to Spark driver pod on kubernetes using spark-submit not with spark-operator?

i want to use restart policy as Always. When my spark streaming app fails it should start automatically. i have tried setting policy in podTemplate but it is not working. apiVersion: v1 Kind: Pod metadata: labels: my-label:…
0
votes
1 answer

spark-submit fails when submitting multiple spark applications at once using spark-on-k8s-operator

I'm trying to submit around 20 spark applications at once. This causes most of them to fail. How do I stop this from happening? The spark-operator pods are not going out of memory. The CPU does increase, but it is for a very short period. The…
Pradyumna
  • 195
  • 3
  • 11
1
2