File path error in pipeline for spark notebook in azure synapse

Question

I have a spark notebook which I am running with the help of pipeline. The notebook is running fine manually but in the pipeline it is giving error for file location. In the code I am loading the file in a data frame. The file location in the code is abfss://storage_name/folder_name/* and in pipeline it is taking abfss://storage_name/filename.parquet\n

This is the error { "errorCode": "6002", "message": "org.apache.spark.sql.AnalysisException: Path does not exist: abfss://storage_name/filename.parquet\n at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4(DataSource.scala:806)\n\n at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4$adapted(DataSource.scala:803)\n\n at org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372)\n\n at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)\n\n at scala.util.Success.$anonfun$map$1(Try.scala:255)\n\n at scala.util.Success.map(Try.scala:213)\n\n at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)\n\n at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)\n\n at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)\n\n at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)\n\n at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)\n\n at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)\n\n at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)\n\n at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)\n\n at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)\n", "failureType": "UserError", "target": "notebook_name", "details": [] }

Can you please include more information about the code you are using and images of the pipeline. — Saideep Arikontham, Jul 17 '22 at 12:36
In the code we are reading a file stored in adls gen2 var_df = spark.read.format("parquet").load("file_path.parquet") In the pipeline I have selected the notebook in which this code exist — darkstar, Jul 18 '22 at 13:54

B. B. Naga Sai Vamsi · Accepted Answer · 2022-07-18T18:39:26.067

1

The above error mainly happens because of permission issue, the synapse workspace required lack of permissions to access storage account, so you need to grant storage blob contributor role.

To add storage account contributor role to your workspace, refer this Microsoft documentation

And also, make sure to check whether you are following ADLS gen2 proper syntax or not.

abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/<path>

Sample code

df = spark.read.load('abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/samplefile.parquet>', format='parquet')

For more detail information refer this link.

edited Jul 18 '22 at 18:39

answered Jul 18 '22 at 17:21

B. B. Naga Sai Vamsi

2,386
2
3
11

The workspace is already having storage account contributor access and file path is also correct as you mentioned. Also, the notebook is running fine manually. – darkstar Jul 19 '22 at 05:52
hi @avg, please edit and provide details information about code and error running pipeline. – B. B. Naga Sai Vamsi Jul 19 '22 at 06:22

score 0 · Answer 2 · answered Jan 21 '23 at 20:39

0

Added my synapse workspace under the required access. Hence, worked.

answered Jan 21 '23 at 20:39

darkstar

39
6

File path error in pipeline for spark notebook in azure synapse

2 Answers2