-1

I am trying to create a job to connect sftp server from aws services to bring files into s3 storage in aws. It will be an automated job which runs every night and bring data into S3. I have seen documentation about how to connect aws and import data into S3 manually. However there is nothing I found about connecting external SFTP server to bring data into S3. I don't know if it is doable?

ac_sql
  • 9
  • 3
  • S3 won't accept direct connection - at least I never heard of it. Since there is no way to generate a ssh key pair, then it is not possible to log in using sftp or ssh. Besides, it is probably a cluster, not a server. I believe the only way to interact with S3 buckets programmatically is using their SDK. – Edelmar Ziegler Oct 17 '16 at 20:20
  • Using a simple C# console application, you can easily read files from an FTP server (FTPS, SFTP) and upload them to S3. – Mahdi Oct 17 '16 at 20:34
  • I didn't mention about the content of the file. They are all zip files and probably only way to automate this process is utilizing SDK libraries. Is there a way to schedule a job and do it regularly. A job to use my java or C# code to automate this process with in AWS. Eventually these files will be loaded in hdfs. – ac_sql Oct 17 '16 at 21:32
  • If you deploy your code on a windows machine, you can use task scheduler. – Mahdi Oct 17 '16 at 21:40
  • Hey @Mahdi, Thanks for the advice. For now we use SQL Server Int.Service packages to do this process However my goal is fully move this process to cloud and I want minimum interaction, worse case I will use SSIS to extract data from sftp and unzip files to local file server then upload those files to s3. – ac_sql Oct 17 '16 at 21:46

2 Answers2

0

You can now use the managed SFTP service by AWS. It provides a fully managed SFTP server which is easy to setup and is reliable, scalable and durable. It uses S3 as backend for storing files.

user3575337
  • 194
  • 1
  • 1
  • 8
-1

Use S3FS to configure sftp connection directly to S3.

All you need to do is install S3FS https://github.com/s3fs-fuse/s3fs-fuse/wiki/Installation-Notes

  1. Install dependencies for fuse and s3cmd.

    CentOS/RHEL Users:

# yum install gcc libstdc++-devel gcc-c++ curl-devel libxml2-devel openssl-devel mailcap

Ubuntu Users:

$ sudo apt-get install build-essential libcurl4-openssl-dev libxml2-dev mime-support
  1. Download and Compile latest fuse

https://github.com/libfuse/libfuse/releases/download/fuse-2.9.7/fuse-2.9.7.tar.gz

# cd fuse-2.9.7
# ./configure --prefix=/usr/local
# make && make install
# export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
# ldconfig
# modprobe fuse
  1. Download and Compile latest S3FS

https://code.google.com/archive/p/s3fs/downloads

# cd /usr/src/
# wget https://s3fs.googlecode.com/files/s3fs-1.74.tar.gz
# tar xzf s3fs-1.74.tar.gz
# cd s3fs-1.74
# ./configure --prefix=/usr/local
# make && make install

4. setup Access Keys

# echo AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY > ~/.passwd-s3fs
# chmod 600 ~/.passwd-s3fs
  1. Mount S3 Bucket

    # mkdir /tmp/cache

    # mkdir /s3mnt

    # chmod 777 /tmp/cache /s3mnt

    # s3fs -o use_cache=/tmp/cache mydbbackup /s3mnt

Make your mount point as ftp user home directory this will direct the files transferred using sftp to S3.

NOTE: Donot forget to add permissions to your S3 Bucket to allow Authenticated AWS users

Ali
  • 955
  • 9
  • 14
  • S3FS is not SFTP. And since S3 is not a traditional file system S3FS is rarely a good idea. – jzonthemtn Oct 18 '16 at 15:26
  • Well I guess S3 is not a good place to perform this task, what if I just connect to external file server from one of the EC2 instance? – ac_sql Oct 18 '16 at 17:06
  • You can do that but it is just synchronizing data between two SFTP servers. you won't be able to maintain the data as comfortable and securly as storing in S3 or Glacier. – Ali Oct 19 '16 at 06:05
  • I agree with your point @jbird, **S3FS is not SFTP** but to transfer files to S3 using FTP that is our only option for now...ofcourse you would face latency issues....@ac_sql why don't you try installing **S3 CLI/SDK** on your SFTP server and run a cron job which will sync all your data. This is a better option if you are **not transferring live data** to S3 using SFTP and you can also avoid the latency issues. – Ali Oct 19 '16 at 06:12
  • @Ali Entertainment SFTP server is belong to customer, we don't have any admin level access to install anything on that server. All we do is to bring data into our current environment via and etl job. However when we move our system to AWS we want direct access to customer's file server instead of bringing their data into our environment. – ac_sql Oct 19 '16 at 15:40