How to get a List of Files in IBM COS Bucket using Watson Studio

Question

I have a working Python script for consolidating multiple xlsx files that I want to move to a Watson Studio project. My current code uses a path variable which is passed to glob...

path = '/Users/Me/My_Path/*.xlsx' files = glob.glob(path)

Since credentials in Watson Studio are specific to individual files, how do I get a list of all files in my IBM COS storage bucket? I'm also wondering how to create folders to separate the files in my storage bucket?

score 1 · Answer 1 · answered Jan 15 '19 at 22:22

The credentials in IBM Cloud Object Storage (COS) is at COS instance level, not at individual file level. Each COS instance can have any number of buckets with each bucket containing files. You can get the credentials for the COS instance from Bluemix console.

https://console.bluemix.net/docs/services/cloud-object-storage/iam/service-credentials.html#service-credentials

You can use boto3 python package to access the files. https://boto3.amazonaws.com/v1/documentation/api/latest/index.html

import boto3
s3c = boto3.client('s3', endpoint_url='XXXXXXXXX',aws_access_key_id='XXXXXXXXXXX',aws_secret_access_key='XXXXXXXXXX')
s3.list_objects(Bucket=bucket_name, Prefix=file_path)
s3c.download_file(Filename=filename, Bucket=bucket, Key=objectname)
s3c.upload_file(Filename=filename, Bucket=bucket, Key=objectname)

Since this is about COS and a Watson Studio project, the pre-installed `ibm_boto3` from [ibm-cos-sdk](https://github.com/IBM/ibm-cos-sdk-python) would be better suited. — Roland Weber, Jan 16 '19 at 06:13

score 1 · Accepted Answer · answered Jan 15 '19 at 22:31

Watson Studio cloud provides a helper library, named project-lib for working with objects in your Cloud Object Storage instance. Take a look at this documentation for using the package in Python: https://dataplatform.cloud.ibm.com/docs/content/analyze-data/project-lib-python.html

For your specific question, get_files() should do what you need. This will return a list of all the files in your bucket, then you can do pattern matching to only keep what you need. Based on this filtered list you can then iterate and use get_file(file_name) for each file_name in your list.

To create a "folder" in your bucket, you need to follow a naming convention for files to create a "pseudo folder". For example, if you want to create a "data" folder of assets, you should prefix file names for objects belonging to this folder with data/.

Thanks Greg. This is exactly what I needed. – J Scott Jan 16 '19 at 16:49 — J Scott, Jan 16 '19 at 16:49

score 1 · Answer 3 · answered Jan 16 '19 at 17:47

There's probably a more pythonic way to write this but here is the code I wrote using project-lib per the answer provided by @Greg Filla

files = []  # List to hold data file names

# Get list of all file names in storage bucket
all_files = project.get_files()  # returns list of dictionaries

# Create list of file names to load based on prefix
for f in all_files:
    if f['name'][:3] == DataFile_Prefix: 
        files.append(f['name'])

print ("There are " + str(len(files)) + " data files in the storage bucket.")

How to get a List of Files in IBM COS Bucket using Watson Studio

3 Answers3