I'm trying to submit Dataproc with .NET spark Job.
The command line looks like:
gcloud dataproc jobs submit spark \
--cluster=<cluster> \
--region=<region> \
--class=org.apache.spark.deploy.dotnet.DotnetRunner \
--jars=gs://bucket/microsoft-spark-2.4.x-0.11.0.jar \
--archives=gs://bucket/dotnet-build-output.zip \
-- find
This command line should call find
function to show the files in the current directory.
And I see only 2 files:
././microsoft-spark-2.4.x-0.11.0.jar
././microsoft-spark-2.4.x-0.11.0.jar.crc
Eventually GCP does not unpack the file from Storage specified as --archives
. The specified file exists and the path was copied from GCP UI. Also I tried to run an exact assembly file from the archive (that exists), but it reasonably fails with File does not exist