0

I am an absolute beginner when it comes to working with REST APIs with python. We have received a share-point URL which has multiple folders and multiples files inside those folders in the 'document' section. I have been provided an 'app_id' and a 'secret_token'.

I am trying to access the .csv file and read them as a dataframe and perform operations. The code for operation is ready after I downloaded the .csv and did it locally but I need help in terms of how to connect share-point using python so that I don't have to download such heavy files ever again.

I know there had been multiple queries already on this over stack-overflow but none helped to get to where I want.

I did the following and I am unsure of what to do next:

import json
from office365.runtime.auth.user_credential import UserCredential
from office365.sharepoint.client_context import ClientContext
from office365.runtime.http.request_options import RequestOptions

site_url = "https://<company-name>.sharepoint.com"
ctx = ClientContext(site_url).with_credentials(UserCredential("{app_id}", "{secret_token}"))

Above for site_url, should I use the whole URL or is it fine till ####.com?

This is what I have so far, next I want to read files from respective folders and convert them into a dataframe? The files will always be in .csv format

The example hierarchy of the folders are as follows:

Documents --> Folder A, Folder B

Folder A --> a1.csv, a2.csv

Folder B --> b1.csv, b2.csv

I should be able to move to whichever folder I want and read the files based on my requirement.

Thanks for the help.

trojan horse
  • 347
  • 2
  • 10

1 Answers1

1

This works for me, using a Sharepoint App Identity with an associated client Id and client Secret.

First, I demonstrate authenticating and reading a specific file, then getting a list of files from a folder and reading the first one.

import pandas as pd
import json
import io

from office365.sharepoint.client_context import ClientCredential
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File

#Authentication (shown for a 'modern teams site', but I think should work for a company.sharepoint.com site:
site="https://<myteams.companyname.com>/sites/<site name>/<sub-site name>"
#Read credentials from a json configuration file:
spo_conf = json.load(open(r"conf\spo.conf", "r")) 

client_credentials = ClientCredential(spo_conf["RMAppID"]["clientId"],spo_conf["RMAppID"]["clientSecret"])
ctx = ClientContext(site).with_credentials(client_credentials)


#Read a specific CSV file into a dataframe:
folder_relative_url = "/sites/<site name>/<sub site>/<Library Name>/<Folder Name>"
filename = "MyFileName.csv"
response = File.open_binary(ctx, "/".join([folder_relative_url, filename]))
df = pd.read_csv(io.BytesIO(response.content))


#Get a list of file objects from a folder and read one into a DataFrame:

def getFolderContents(relativeUrl):
    contents = []
    library = ctx.web.get_list(relativeUrl)
    all_items = library.items.filter("FSObjType eq 0").expand(["File"]).get().execute_query()
    for item in all_items:  # type: ListItem
        cur_file = item.file
        contents.append(cur_file)
    return contents

fldrContents = getFolderContents('/sites/<site name>/<sub site>/<Library Name>')
response2 = File.open_binary(ctx, fldrContents[0].serverRelativeUrl)
df2 = pd.read_csv(io.BytesIO(response2.content))

Some References:

Related SO thread.

Office365 library github site.

Getting a list of contents in a doc library folder.


Additional notes following up on comments:

The site path doesn't not include the full url for the site home page (ending in .aspx) - it just ends with the name for the site (or sub-site, if relevant to your case).

You don't need to use a configuration file to store your authentication credentials for the Sharepoint application identity - you could just replace spo_conf["RMAppID"]["clientId"] with the value for the Sharepoint-generated client Id and do similarly for the client Secret. But this is a simple example of what the text of a JSON file could look like:

{
    "MyAppName":{
        "clientId": "my-client-id",
        "clientSecret": "my-client-secret",
        "title":"name_for_application"
    }
} 
BioData41
  • 862
  • 9
  • 15
  • Hi - May I know how to set up json configuration file in this case. Also, for site_url do I take it all the way until the ".aspx", i.e. the complete URL? – trojan horse Dec 30 '21 at 05:18
  • Thanks @BioData41 for he response, I tried now and it is not throwing any error but when I print response it gives me 403 code which is to say forbidden error. Even when I try to read the data frame later it gives the same 403 forbidden. – trojan horse Dec 30 '21 at 22:53
  • @trojanhorse, are you sure that the person who created the Application Identity set up the correct permissions for it to be able to read from the sites web collection? – BioData41 Dec 30 '21 at 23:08