1

on a AWS Glue Job, I'm using ftplib to download files and store them to S3, with the following code:

from ftplib import FTP
ftp = FTP()
ftp.connect("ftp.ser.ver", 21)
ftp.login("user", "password")
remotefile='filename.txt'
download='s3://bucket/folder/filename.txt'
with open(download,'wb') as file:
    ftp.retrbinary('RETR %s' % remotefile, file.write)

And I got an error stated as follow:

FileNotFoundError: [Errno 2] No such file or directory

Ran the same code through local and changed the download path to local path and the code works. I'm fairly new to S3 and Glue and not sure where to look for right documentations. Any insight and suggestion is greatly appreciated.

xiexieni9527
  • 111
  • 7

1 Answers1

2

You can't download an FTP file and directly save it towards S3. You will have to use either a memory-based or file-based stream to save it in the glue environment before you could upload it to S3.

from boto3.session import Session
import boto3
from ftplib import FTP
ftp = FTP()
ftp.connect("ftp.ser.ver", 21)
ftp.login("user", "password")

with open("/tmp/filename.txt",'wb') as file:
    ftp.retrbinary("filename.txt", file.write)

s3 = boto3.client('s3')
with open("/tmp/filename.txt", "rb") as f:
    s3.upload_fileobj(f, "BUCKET_NAME", "OBJECT_NAME")
Allan Chua
  • 9,305
  • 9
  • 41
  • 61
  • 1
    For memory-based solution, see also https://stackoverflow.com/q/69221348/850848 + With s3fs, you can stream it even directly, without having to keep whole file in memory: https://stackoverflow.com/q/41171784/850848 – Martin Prikryl Feb 11 '22 at 18:52
  • Thank you both so much, your answer and comment have unblocked my 3-day struggle. – xiexieni9527 Feb 15 '22 at 01:49