0

I am using Azure ML Studio and trying to upload an excel file as Dataset. However, I am not getting option for it. Am I missing something?

enter image description here

Manas Kumar
  • 2,411
  • 3
  • 16
  • 23

1 Answers1

0

It sounds like you want to read an Excel file in an Execute Python Script module of an experiment of Azure Machine Learning Studio. According to the offical document [Execute Python machine learning scripts in Azure Machine Learning Studio][1], there are two ways to do that as below.

  1. To upload the Excel file to Azure Blob Storage, then follow the section Accessing Azure Storage Blobs to read it by using Azure Blob Storage SDK for Python.

  2. Refer to the section Importing existing Python script modules to package the Excel file with other required Python packages as a zip file, then to read it from the directory named Script Bundle of the zip file by automatically extracting by Azure ML Stodio.

As reference, I will show you the detail steps for the second solution as below.

  1. I prepared an excel file named test.xlsx, which content as below.

    enter image description here

  2. Download the xlrd package file xlrd-1.2.0-py2.py3-none-any.whl from its PyPi.org page, then to extract these compressed files of it to a directory test and compress them with test.xlsx to a zip file test.zip, as below.

    enter image description here

  3. I uploaded the zip file test.zip as a dataset to Azure ML Studio, and assemble it with a Execute Python Script module.

    enter image description here

  4. Here is my sample code. I tried to use os.getcwd(), os.listdir(), os.listdir('Script Bundle') with logs to find the correct path for reading a file in the zip file.

    import pandas as pd
    
    def azureml_main(dataframe1 = None, dataframe2 = None):
        import os
        print(os.getcwd())
        print(os.listdir())
        print(os.listdir('Script Bundle'))
    
        import xlrd
        file = 'Script Bundle/test.xlsx'
        data = xlrd.open_workbook(file)
        print([sheet.name for sheet in data.sheets()])
    
        print('Input pandas.DataFrame #1:\r\n\r\n{0}'.format(dataframe1))
    
        return dataframe1,
    

It works in Anaconda 4.0/Python 3.5, the logs as below.

enter image description here

Hope it helps.

Peter Pan
  • 23,476
  • 4
  • 25
  • 43