0

I have used the following code in python to read a parquet file from a datastore as:

from azureml.core import Dataset, Datastore, Workspace

subscription_id = 'xyz'
resource_group = 'abc'
workspace_name = 'pqr'

workspace = Workspace(subscription_id, resource_group, workspace_name)
datastore = Datastore.get(workspace, 'workspaceblobstore')

tabular_dataset_3 = Dataset.Tabular.from_parquet_files(path=(datastore,'/UI/09-17-2022_125003_UTC/userdata1.parquet'))

df=tabular_dataset_3.to_pandas_dataframe()

I have checked it here but have not found any documentation to read a parquet file from a datastore.

Since, I am using the Azure ML notebook with R kernel, So, Can anyone please help how to write the equivalent R code in Azure ML notebook with R kernel ?

Any help would be appreciated.

ankit
  • 277
  • 1
  • 4
  • 25

1 Answers1

1

I'm completely new to R, but I've been able to read parquet files in our storage account using the AzureStor R package, along with the apache "arrow" package

install.packages("AzureStor")
install.packages("arrow")

library(AzureStor)
library(arrow)

token <- AzureRMR::get_azure_token("https://storage.azure.com", tenant="<tenant-id>", app="<client-id>", password="<client-secret>")
ad_endp_tok2 <- storage_endpoint("https://<mystorageaccount>.dfs.core.windows.net", token=token)
container <- storage_container(ad_endp_tok2, "<mycontainername>")
rawdata <- storage_download(container, src="<somefile>.parquet", dest=NULL)
parq_df <- read_parquet(rawdata)
Kyle M
  • 11
  • 1
  • Hi, Thank you for the answer. I found the documentation here https://cran.r-project.org/web/packages/AzureStor/vignettes/intro.html . I am using Azure Machine Learning Studio as provided by the company and I have a datastore with type `Azure Blob Storage` and It has the Authentication type as `Account Key` and I can't see its value as it is encrypted with dots. Could you please tell me, Is there any way to access the account key as datastore is registered with someone else ? – ankit Oct 15 '22 at 09:06
  • Ah, this solution isn't applicable to AML studio and datastores, which I'm unfamiliar. Perhaps this package might help: https://github.com/RevolutionAnalytics/AzureML/wiki/Bug-bash-instructions#work-with-azureml-datasets – Kyle M Oct 17 '22 at 15:31
  • I think this solution would work because I have tried the above code without using `token` and it is not giving any error while using functions `storage_endpoint()` and `storage_container()` but while using `storage_download()`, it is giving error because it is trying to read a file from a datastore which is registered with an account key and I don't have permission to use it. I think, the link which you have just mentioned is also using token. Could you please tell, Is there any way to write code without using authentication key just like I have written python code in the question ? – ankit Oct 17 '22 at 16:04