0

I've multiple XLSX files in the azure blob container which I should load those xlsx files snowflake table.

Note: I don't want to perform the conversion operation of xlsx to csv which actually consumes more time. Also I did used the ADF Copy Activity for loading but it is taking more time to load the data.

so is there any way I can use the snowflake snowpark to load the xlsx to the snowflake table?

Additionally, I welcome any other feasible alternatives for this issue through which I can load the xlsx data quickly into the snowflake

  • You can use pyspark to load the excel file from blob as dataframe and write to snowflake table as an alternative. – Rakesh Govindula Feb 20 '23 at 10:39
  • You can use [pandas.read_excel](https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html) to create a Dataframe which you can save later to Snowflake. – Sergiu Feb 20 '23 at 10:54

1 Answers1

0

Additionally, I welcome any other feasible alternatives for this issue through which I can load the xlsx data quickly into the snowflake

You can try databricks with pandas dataframe here. First Mount the Blob Storage to databricks and read the excel file using openpyxl. Write the pandas dataframe to snowflake as suggested by @Sergiu.

import pandas as pd

storage_account_name = "storage account name"
container_name = "container name"
mount_point = "/mnt/mount point"
dbutils.fs.mount(
  source=f"wasbs://{container_name}@{storage_account_name}.blob.core.windows.net",
  mount_point=mount_point,
  extra_configs={
f"fs.azure.account.key.{storage_account_name}.blob.core.windows.net": "storage account access key"
  }
)

enter image description here install openpyxl if you don't have. !pip install openpyxl

Reading excel file after mounting:

file_path = f"/dbfs/{mount_point}/myexcel.xlsx"
df = pd.read_excel(file_path,engine="openpyxl")

enter image description here

After reading it as pandas dataframe, follow this procedure by @stephenallwright to write this dataframe to snowflake table.

Naveen Sharma
  • 349
  • 2
  • 4