I found it faster to use aws s3 cp than boto to download files from s3 public repositories.
For some reason if I use boto my job executes fine.
If I use os.system('aws s3 cp s3://1000genomes/XXX /home/admin/data/XXX') the instance will download the files to the local drive but then fails to continue with the script.
I don't see any errors in the /var/log/cloud-init.out file
Here's the cloud-init-output file
Cloud-init v. 0.7.6 running 'modules:config' at Thu, 10 Dec 2015 15:50:39 +0000. Up 91.85 seconds.
Generating locales (this might take a while)...
en_US.UTF-8... done
Generation complete.
2015-12-10 15:50:44,535 - util.py[WARNING]: Running apt-configure (<module 'cloudinit.config.cc_apt_configure' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_apt_configure.pyc'>) failed
Cloud-init v. 0.7.6 running 'modules:final' at Thu, 10 Dec 2015 15:50:44 +0000. Up 97.16 seconds.
chr20 test [('HG00097', 's3://1000genomes/phase3/data/HG00097/alignment/HG00097.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam', 's3://1000genomes/phase3/data/HG00097/alignment/HG00097.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam.bai'), ('HG00096', 's3://1000genomes/phase3/data/HG00096/alignment/HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20120522.bam', 's3://1000genomes/phase3/data/HG00096/alignment/HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.bai')] #Output from the script. This works.
Cloud-init v. 0.7.6 finished at Thu, 10 Dec 2015 15:51:02 +0000. Datasource
DataSourceEc2. Up 115.24 seconds
If I ssh into the instance and run the same user data commands it works!
#!/bin/bash
mount /dev/xvdb /home/admin/data
chmod 777 /home/admin/data
rm -rf /home/admin/data/*
python /home/admin/src/chr20_test.py 0
Why is it when I submit the job with user data it fails to produce the output file I want (but manages to download the files from s3)? But when I ssh into the instance and run the same commands it works?