0

I am trying to move some Logfiles, which are located on an external Webserver to an Amazon S3 bucket. This should happen every 7 days without manually activating it. Additionally I'd like it to be "failsafe", so it probably would be best if the copying operation would be done in the Amazon Cloud. I have already read something about the AWS Data Pipelining solution but I couldn't find anything on how to get it to work with an external (that means not hosted by Amazon) data source, let alone downloading a file from a webserver and then processing it. Has somebody got experience with a similar problem and any advice for me where to start?

Thank you!

Biffy
  • 871
  • 2
  • 10
  • 21
  • 1
    How many servers are you talking about? If its just one, you should be able to upload it with a scheduled task or cron. – datasage Nov 06 '13 at 20:52
  • It is just one Server. The Webserver is an external datasource to which I only have limited access ( I can only copy the files, normally this is done manually with a browser) Is there a possibility to schedule this task, so the coping is done by Amazon? – Biffy Nov 07 '13 at 08:04

1 Answers1

2

I don't believe any of the existing components will do what you want out of the box, but you can always run a script as part of a data pipeline. I've used it that way to run a script that grabs files from an external FTP and then loads them into an S3 bucket every hour.

Gordon Seidoh Worley
  • 7,839
  • 6
  • 45
  • 82
  • Could you please go into more detail on this solution? Currently I am writing a script which uses "staging" and copies the files to the Output bucket using the environment varible ${OUTPUT1_STAGING_DIR} in the bash script. This sadly does not work, as I get this error message: "taging local files to S3 failed. The request signature we calculated does not match the signature you provided. Check your key and signing method." Thank you very much! – Biffy Nov 20 '13 at 15:30
  • @Biffy, I'm not sure what the problem is there, but I'd post it as a separate question here on SO so that you get the best chance of people seeing it and answering it. – Gordon Seidoh Worley Nov 20 '13 at 16:23
  • what did you write the script in? Did you just add it as a ShellCommandActivity script that pulled from ftp and copied to s3? Any examples? – MonkeyBonkey Aug 12 '14 at 22:26
  • I don't know there's any existing component in data pipeline allowing you to run script on an external server either. You can use ShellCommandActivity on one EC2 instance you created inside the pipeline, then how to access your external server from this script is something you have to design (like through FTP). To backup to S3 you can install s3cmd tool in the EC2 instance. – piggybox Oct 29 '14 at 23:49