3

I have a folder with bunch of subfolders and files which I am fetching from a server and assigning to a variable. Folder Structure is as follows:


└── main_folder
   ├── folder
    │   ├── folder
    │   │   ├── folder
    │   │   │   └── a.json
    │   │   ├── folder
    │   │   │   ├── folder
    │   │   │   │   └── b.json
    │   │   │   ├── folder
    │   │   │   │   └── c.json
    │   │   │   └── folder
    │   │   │       └── d.json
    │   │   └── folder
    │   │       └── e.json
    │   ├── folder
    │   │   └── f.json
    │   └── folder
    │       └── i.json

Now I want to upload this main_folder to S3 bucket with the same structure using boto3. In boto3 there is no way to upload folder on s3.

I have seen the solution on this link but they fetching the files from local machine and I have fetching the data from server and assigining to variable.

Uploading a folder full of files to a specific folder in Amazon S3

upload a directory to s3 with boto

https://gist.github.com/feelinc/d1f541af4f31d09a2ec3

Has anybody faced the same type of issue?

ImPurshu
  • 390
  • 6
  • 19
  • Do you specifically want to code it yourself, or would you be willing to use the [AWS Command-Line Interface (CLI)](http://aws.amazon.com/cli/)? It can do it with one command. – John Rotenstein Jun 03 '19 at 12:17
  • I want to do via code only @JohnRotenstein – ImPurshu Jun 03 '19 at 12:23
  • It seems that you have data on "a server" and you want to put it in an Amazon S3 bucket. You could either run code on the "server" to send it to S3, or you could run code on another computer to retrieve it from the server and then upload it to S3. So, what precisely is your question? Can you tell us what problem you are facing? – John Rotenstein Jun 03 '19 at 21:05
  • Do you want something like https://stackoverflow.com/q/56428313/3220113 ? – Walter A Jun 05 '19 at 14:12

2 Answers2

6

Below is code that works for me, pure python3.

""" upload one directory from the current working directory to aws """
from pathlib import Path
import os
import glob
import boto3

def upload_dir(localDir, awsInitDir, bucketName, tag, prefix='/'):
    """
    from current working directory, upload a 'localDir' with all its subcontents (files and subdirectories...)
    to a aws bucket
    Parameters
    ----------
    localDir :   localDirectory to be uploaded, with respect to current working directory
    awsInitDir : prefix 'directory' in aws
    bucketName : bucket in aws
    tag :        tag to select files, like *png
                 NOTE: if you use tag it must be given like --tag '*txt', in some quotation marks... for argparse
    prefix :     to remove initial '/' from file names

    Returns
    -------
    None
    """
    s3 = boto3.resource('s3')
    cwd = str(Path.cwd())
    p = Path(os.path.join(Path.cwd(), localDir))
    mydirs = list(p.glob('**'))
    for mydir in mydirs:
        fileNames = glob.glob(os.path.join(mydir, tag))
        fileNames = [f for f in fileNames if not Path(f).is_dir()]
        rows = len(fileNames)
        for i, fileName in enumerate(fileNames):
            fileName = str(fileName).replace(cwd, '')
            if fileName.startswith(prefix):  # only modify the text if it starts with the prefix
                fileName = fileName.replace(prefix, "", 1) # remove one instance of prefix
            print(f"fileName {fileName}")

            awsPath = os.path.join(awsInitDir, str(fileName))
            s3.meta.client.upload_file(fileName, bucketName, awsPath)

if __name__ == '__main__':
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument("--localDir", help="which dir to upload to aws")
    parser.add_argument("--bucketName", help="to which bucket to upload in aws")
    parser.add_argument("--awsInitDir", help="to which 'directory' in aws")
    parser.add_argument("--tag", help="some tag to select files, like *png", default='*')
    args = parser.parse_args()

    # cd whatever is above your dir, then run it
    # (below assuming this script is in ~/git/hu-libraries/netRoutines/uploadDir2Aws.py )
    # in the example below you have directory structure ~/Downloads/IO
    # you copy full directory of ~/Downloads/IO to aws bucket markus1 to 'directory' 2020/IO
    # NOTE: if you use tag it must be given like --tag '*txt', in some quotation marks...

    # cd ~/Downloads
    # python ~/git/hu-libraries/netRoutines/uploadDir2Aws.py --localDir IO --bucketName markus1 --awsInitDir 2020
    upload_dir(localDir=args.localDir, bucketName=args.bucketName,
               awsInitDir=args.awsInitDir, tag=args.tag)
Dharman
  • 30,962
  • 25
  • 85
  • 135
Markus Kaukonen
  • 334
  • 4
  • 10
0

I had to solve this problem myself, so thought I would include a snippet of my code here.

I also had the requirement to filter for specific file types, and upload the directory contents only (vs the directory itself).

import logging
import boto3

from pathlib import Path


log = logging.getLogger(__name__)


def upload_dir(
    self,
    local_dir: Union[str, Path],
    s3_path: str = "/",
    file_type: str = "",
    contents_only: bool = False,
) -> bool:
    """
    Upload the content of a local directory to a bucket path.

    Args:
        local_dir (Union[str, Path]): Directory to upload files from.
        s3_path (str, optional): The path within the bucket to upload to.
            If omitted, the bucket root is used.
        file_type (str, optional): Upload files with extension only, e.g. txt.
        contents_only (bool): Used to copy only the directory contents to the
            specified path, not the directory itself.

    Returns:
        dict: key:value pair of file_name:upload_status.
            upload_status True if uploaded, False if failed.
    """
    resource = boto3.resource(
        "s3",
        aws_access_key_id="xxx",
        aws_secret_access_key="xxx",
        endpoint_url="xxx",
        region_name=Bucket"xxx",
    )

    status_dict = {}

    local_dir_path = Path(local_dir).resolve()
    log.debug(f"Directory to upload: {local_dir_path}")

    all_subdirs = local_dir_path.glob("**")

    for dir_path in all_subdirs:

        log.debug(f"Searching for files in directory: {dir_path}")
        file_names = dir_path.glob(f"*{('.' + file_type) if file_type else ''}")

        # Only return valid files
        file_names = [f for f in file_names if f.is_file()]
        log.debug(f"Files found: {list(file_names)}")

        for _, file_name in enumerate(file_names):
            s3_key = str(Path(s3_path) / file_name.relative_to(
                local_dir_path if contents_only else local_dir_path.parent
            ))
            log.debug(f"S3 key to upload: {s3_key}")
            status_dict[str(file_name)] = self.upload_file(s3_key, file_name)

    return status_dict