Is Azure ML Studio support to import Excel file as Dataset?

Question

I am using Azure ML Studio and trying to upload an excel file as Dataset. However, I am not getting option for it. Am I missing something?

you can convert excel file to csv easily, is it not an option for you ? — Thomas, Aug 02 '19 at 20:02
@Thomas yes that is an workaround option, aware of it. But how could I do it with XLS file without convert. — Manas Kumar, Aug 03 '19 at 07:16
I think you will find that the simplest option - convert it to csv - will be far and away the easiest. — TomC, Nov 11 '19 at 05:53

score 0 · Answer 1 · answered Aug 05 '19 at 07:36

It sounds like you want to read an Excel file in an Execute Python Script module of an experiment of Azure Machine Learning Studio. According to the offical document [Execute Python machine learning scripts in Azure Machine Learning Studio][1], there are two ways to do that as below.

To upload the Excel file to Azure Blob Storage, then follow the section Accessing Azure Storage Blobs to read it by using Azure Blob Storage SDK for Python.
Refer to the section Importing existing Python script modules to package the Excel file with other required Python packages as a zip file, then to read it from the directory named Script Bundle of the zip file by automatically extracting by Azure ML Stodio.

As reference, I will show you the detail steps for the second solution as below.

I prepared an excel file named test.xlsx, which content as below.
Download the xlrd package file xlrd-1.2.0-py2.py3-none-any.whl from its PyPi.org page, then to extract these compressed files of it to a directory test and compress them with test.xlsx to a zip file test.zip, as below.
I uploaded the zip file test.zip as a dataset to Azure ML Studio, and assemble it with a Execute Python Script module.

Here is my sample code. I tried to use os.getcwd(), os.listdir(), os.listdir('Script Bundle') with logs to find the correct path for reading a file in the zip file.

import pandas as pd

def azureml_main(dataframe1 = None, dataframe2 = None):
    import os
    print(os.getcwd())
    print(os.listdir())
    print(os.listdir('Script Bundle'))

    import xlrd
    file = 'Script Bundle/test.xlsx'
    data = xlrd.open_workbook(file)
    print([sheet.name for sheet in data.sheets()])

    print('Input pandas.DataFrame #1:\r\n\r\n{0}'.format(dataframe1))

    return dataframe1,

It works in Anaconda 4.0/Python 3.5, the logs as below.

Hope it helps.

Is Azure ML Studio support to import Excel file as Dataset?

1 Answers1