0

Am in Synapse notebook, using pyspark to move file using msspark.fs.mv(src, dest, True)

Link to ms doc: https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/microsoft-spark-utilities?pivots=programming-language-python#move-file

Code:

filepath = "abfss://raw@xxxxdev001.blob.core.windows.net/SASDatFiles/test_sep22.sas7bdat "
movepath = "abfss://raw@xxxxdev001.blob.core.windows.net/SASDatFiles/Processed/test_sep22.sas7bdat"
mssparkutils.fs.mv(filepath,movepath, True)

Error:

**Py4JJavaError: An error occurred while calling z:mssparkutils.fs.mv.
: Operation failed: "An HTTP header that's mandatory for this request is not specified.", 400, PUT, https://xxxxdev001.blob.core.windows.net/raw/SASDatFiles/Processed/test_sep22.sas7bdat?timeout=90, , ""
    at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:199)
    at org.apache.hadoop.fs.azurebfs.services.AbfsClient.renamePath(AbfsClient.java:337)
    at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.rename(AzureBlobFileSystemStore.java:774)
    at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.rename(AzureBlobFileSystem.java:354)
    at com.microsoft.spark.notebook.msutils.impl.MSFsUtilsImpl.mvWithinFileSystem(MSFsUtilsImpl.scala:128)
    at com.microsoft.spark.notebook.msutils.impl.MSFsUtilsImpl.mv(MSFsUtilsImpl.scala:259)
    at mssparkutils.fs$.mv(fs.scala:22)
    at mssparkutils.fs.mv(fs.scala)
    at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:750)**

Notebook running under my credentials, i got owner / blob contributor role assigned on Azure Data Lake Gen 2. I can move files with my credentials in Storage Explorer ... no issues.

Any clues of the error?

Sreedhar
  • 29,307
  • 34
  • 118
  • 188
  • How is the notebook connecting to data lake. When you run the notebook, it doesn't inherit your access to the data lake. – ARCrow Jan 24 '23 at 04:38
  • Notebook is part of Synapse pipeline, Synapase got 'Blob Data Contributor' on data lake – Sreedhar Jan 24 '23 at 11:11

1 Answers1

0

I try to reproduce same thing in my environment. I Got this output.

Configure your storage account as per below syntax:

spark.conf.set("fs.azure.account.key.<storage_account_name>.blob.core.windows.net","<Access_key>") 


mssparkutils.fs.mv("abfss://<container_name>@<storage_account>.dfs.core.windows.net/vamsiba.sas7bdat","abfss://<container_name>@<storage_account>.dfs.core.windows.net/<folder>")

enter image description here

Or

If you want to copy data , use mssparkutils.fs.cp as shown in the below code:

mssparkutils.fs.cp("abfss://<container_name>@<storage_account>.dfs.core.windows.net/vamsiba.sas7bdat","abfss://<container_name>@<storage_account>.dfs.core.windows.net/<folder>")

enter image description here

Note:

Source location: abfss://<container_name>@<storage_account>.dfs.core.windows.net/vamsiba.sas7bdat

Destination location:

abfss://<container_name>@<storage_account>.dfs.core.windows.net/<folder>

Before running the code or (Moving the file) make sure to check destination location should not contain the same file name as vamsiba.sas7bdat in the folder location otherwise you will get an error.

B. B. Naga Sai Vamsi
  • 2,386
  • 2
  • 3
  • 11