0

I found it faster to use aws s3 cp than boto to download files from s3 public repositories.

For some reason if I use boto my job executes fine.

If I use os.system('aws s3 cp s3://1000genomes/XXX /home/admin/data/XXX') the instance will download the files to the local drive but then fails to continue with the script.

I don't see any errors in the /var/log/cloud-init.out file

Here's the cloud-init-output file

Cloud-init v. 0.7.6 running 'modules:config' at Thu, 10 Dec 2015 15:50:39 +0000. Up 91.85 seconds.
Generating locales (this might take a while)...
  en_US.UTF-8... done
Generation complete.
2015-12-10 15:50:44,535 - util.py[WARNING]: Running apt-configure     (<module     'cloudinit.config.cc_apt_configure' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_apt_configure.pyc'>) failed
Cloud-init v. 0.7.6 running 'modules:final' at Thu, 10 Dec 2015 15:50:44 +0000. Up 97.16 seconds.
chr20 test [('HG00097', 's3://1000genomes/phase3/data/HG00097/alignment/HG00097.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam', 's3://1000genomes/phase3/data/HG00097/alignment/HG00097.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam.bai'), ('HG00096', 's3://1000genomes/phase3/data/HG00096/alignment/HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20120522.bam', 's3://1000genomes/phase3/data/HG00096/alignment/HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.bai')] #Output from the script. This works.
Cloud-init v. 0.7.6 finished at Thu, 10 Dec 2015 15:51:02 +0000. Datasource   
DataSourceEc2.  Up 115.24 seconds

If I ssh into the instance and run the same user data commands it works!

#!/bin/bash
mount /dev/xvdb /home/admin/data
chmod 777 /home/admin/data
rm -rf /home/admin/data/*
python /home/admin/src/chr20_test.py 0

Why is it when I submit the job with user data it fails to produce the output file I want (but manages to download the files from s3)? But when I ssh into the instance and run the same commands it works?

  • 2
    What exactly do you mean by "fails to continue with the script."? Are you able to share the commands that are not working? – Jason Dec 10 '15 at 17:03
  • It's streaming program where it stops executing the script is here `os.system('sudo aws s3 cp s3://1000genomes/XXX /home/admin/data/XXX') do_stuff('/home/admin/data/XXX')` `do_stuff(fh): print fh open('/home/admin/out','w')` Run I run the instance in cloud-init-output it does not print fh in the function but is able to download the s3 files – DolphinGenomePyramids Dec 10 '15 at 19:13
  • I would avoid using os.system() to call commands from python. The favored method is to use the Subprocess module. I suspect that your os.system() call is forking out and the cloud-init does not return. May I suggest leveraging runcmd: to run your python script, and separate the aws cp from the python script if you can. – cgseller Mar 15 '16 at 01:37

0 Answers0