Fetching top n rows recursively using Python

Question

Could anyone help in suggesting how to fetch the top n rows recursively from a file.

The requirement is download content from SharePoint site which has xml content and that should be converted to json data file. So each time we download, it should pick first 1000 rows and write it back to one json file, and in next iteration it should fetch next 1000 rows recursively and write the data to second json file and so on until the entire content from webpage is completely loaded into n number of files.

Here is the PowerShell script for reference and we have to code it in Python Scripting..

 *elseif($fileName -eq "ProjectDataURLs")
{
    Write-Host "Generating JSON files for Project data PWAEng" -ForegroundColor DarkYellow
    foreach($ProjectData in $importProjectDataFile)
    {
        $fullURL=$ProjectData.ProjectData
        $location = $ProjectData.ProjectData
        
        $date=(Get-Date).ToString("yyyyMMdd")
        
        [int]$count=0
    do{
           
            $fileName="file_"+$location+"_"+$count+"_"+$date
            # output file
            [string]$output = $FilePath+"\"+$fileName+".json"
            #Write-Host $output
          
            $task=$fullURL
            # load client assemblies
            Add-Type -Path ([System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SharePoint.Client").location)
            Add-Type -Path ([System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SharePoint.Client.runtime").location)
            $spoCred = New-Object Microsoft.SharePoint.Client.SharePointOnlineCredentials($userName,$Password)
            
            $url = 'https://myorg.sharepoint.com/sites/pwaeng/_api/projectdata/'+$task
            $request = [System.Net.WebRequest]::Create($url)
            $request.Credentials = $spoCred
            $request.Timeout=60000
            
            $request.Accept="application/json;charset=utf-8"
            
            $request.Headers.Add("X-FORMS_BASED_AUTH_ACCEPTED", "f")
            $response = $request.GetResponse()
            $reader = New-Object System.IO.StreamReader $response.GetResponseStream()
            $data = $reader.ReadToEnd()
            $data > $output
            $getParas=$data |ConvertFrom-Json
            $getSkipToken=$getParas.'odata.nextLink'
            #Write-Host $getSkipToken
            $topR='&?$top=1000'
            
            $fullURL=$getSkipToken+$topR
            $count=$count+1
       }
       while($getSkipToken -ne $nulll)
    }
 }*

This PowerShell script is hard to read. If you can try to write the Python code yourself, or carefully write down the steps which you would like to code, I can help you. — thenarfer, Jul 26 '21 at 19:30
Thanks @thenarfer - Here is the link to one of the code block that i have already implemented to download the files from Sharepoint URL. We need to fetch the n number of rows from web URL/list file in each iteration and continue to fetch until eof and convert them to json odata readable format. The above powershell script that i have copied is the second block of code which is same but it downloads different files. https://stackoverflow.com/questions/68488020/conversion-of-atom-or-odata-xml-file-to-odata-json-file-using-python — Jeevan_G1, Jul 27 '21 at 09:53
The PowerShell script above downloads the 1000 rows from SharePoint URL list file in each iteration using $top and @odata.nextLink properties. How can we achieve the same results using Python scripting.. — Jeevan_G1, Jul 28 '21 at 10:15

score 1 · Accepted Answer · answered Aug 10 '21 at 07:30

Finally it has worked with Python code after trying with multiple attempts :) Below is the complete Python code which connects to Web URL and fetches 1000 rows in each iteration i.e. It connects to the webpage and downlaods the first file and reads the same file to search for odata.nextLink in that file. If found, then appends that link to actual web URL and fetchs 1000 rows in each iteration and loops through it until it does not find the odata.nextLink in previous generated output file.

Here is the solution for implementing the logic using Python code which is working absolutely..

from datetime import datetime
import os
import Credentials
import requests

from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.runtime.http.request_options import RequestOptions
import json

userName = Credentials.UserName
Password = Credentials.Password

Current_Date = datetime.today()
new_dir = Current_Date.strftime("%Y%m%d")
parent_dir = "/opt/app/Informatica/pc/server/infa_shared/TgtFiles/Migrations/Output_Files"
Filepath = os.path.join(parent_dir, new_dir)

try:
   os.mkdir(Filepath)
   print("Directory '%s' has been created" %new_dir)
except OSError as error:
   print(error)

importFileType = parent_dir + '/FileType.csv'
importListFile = parent_dir + '/ListsFile.csv'
importProjectDataFile = parent_dir + '/ESPJsonFiles.csv'

with open(importFileType, 'r') as files:
   next(files)
   for fileName in files:
      if [fileName == "ProjectDataURLs"]:
         with open(importProjectDataFile, 'r') as ProjectData:
            next(ProjectData)
            for line in ProjectData:
               #List, ListFolder = line.split(',')
               fullURL = line
               location = line
               date = datetime.today().strftime("%Y%m%d")

               count = 0

               while 'odata.nextLink' is not None:
                  fileName = "file_" + location.strip() + "_" + str(count) + "_" + date
                  xml_output = Filepath + "/" + fileName + ".xml"
                  json_output = Filepath + "/" + fileName + ".json"
                  task = fullURL.strip()

                  webAbsoluteURL = "https://myorg.sharepoint.com/sites/pwaeng"
                  url = webAbsoluteURL + "/_api/projectdata/" + task + "?&$format=json"

                  ctx_auth = AuthenticationContext(webAbsoluteURL)
                  token = ctx_auth.acquire_token_for_user(userName, Password)
                  options = RequestOptions(webAbsoluteURL)
                  ctx_auth.authenticate_request(options)

                  response = requests.get(url, headers=options.headers, allow_redirects=True, timeout=60000)

                  with open(json_output, 'wb') as file_save:
                     file_save.write(response.content)

                  py_obj = json.loads(response.text)
                  if 'odata.nextLink' in py_obj.keys():
                     next_iter = py_obj['odata.nextLink']
                     topR = '&?$top=1000'
                     fullURL = str(next_iter) + topR
                  else:
                     break
                  count = count + 1
            break

Fetching top n rows recursively using Python

1 Answers1