I am using Azure ML Studio and trying to upload an excel file as Dataset. However, I am not getting option for it. Am I missing something?
-
you can convert excel file to csv easily, is it not an option for you ? – Thomas Aug 02 '19 at 20:02
-
@Thomas yes that is an workaround option, aware of it. But how could I do it with XLS file without convert. – Manas Kumar Aug 03 '19 at 07:16
-
I think you will find that the simplest option - convert it to csv - will be far and away the easiest. – TomC Nov 11 '19 at 05:53
1 Answers
It sounds like you want to read an Excel file in an Execute Python Script
module of an experiment of Azure Machine Learning Studio. According to the offical document [Execute Python machine learning scripts in Azure Machine Learning Studio][1]
, there are two ways to do that as below.
To upload the Excel file to Azure Blob Storage, then follow the section
Accessing Azure Storage Blobs
to read it by using Azure Blob Storage SDK for Python.Refer to the section
Importing existing Python script modules
to package the Excel file with other required Python packages as a zip file, then to read it from the directory namedScript Bundle
of the zip file by automatically extracting by Azure ML Stodio.
As reference, I will show you the detail steps for the second solution as below.
I prepared an excel file named
test.xlsx
, which content as below.Download the
xlrd
package filexlrd-1.2.0-py2.py3-none-any.whl
from its PyPi.org page, then to extract these compressed files of it to a directorytest
and compress them withtest.xlsx
to a zip filetest.zip
, as below.I uploaded the zip file
test.zip
as a dataset to Azure ML Studio, and assemble it with aExecute Python Script
module.Here is my sample code. I tried to use
os.getcwd()
,os.listdir()
,os.listdir('Script Bundle')
with logs to find the correct path for reading a file in the zip file.import pandas as pd def azureml_main(dataframe1 = None, dataframe2 = None): import os print(os.getcwd()) print(os.listdir()) print(os.listdir('Script Bundle')) import xlrd file = 'Script Bundle/test.xlsx' data = xlrd.open_workbook(file) print([sheet.name for sheet in data.sheets()]) print('Input pandas.DataFrame #1:\r\n\r\n{0}'.format(dataframe1)) return dataframe1,
It works in Anaconda 4.0/Python 3.5
, the logs as below.
Hope it helps.

- 23,476
- 4
- 25
- 43