It is always possible using s3distcp to copy a file(or set of files) into another location of s3, but is it possible, using mapred or any other functionality of Hadoop/EMR to take a random sample(or every nth line) of the file(s) to a new location in s3. The catch is save the time of copying data to the local machine and upload it again to s3.
Here's the time-taking code I want to optimize with this process.
aws s3 cp s3://... localLocation
cat localLocation | awk '{if(NR%10==0) print $0' > samp.txt
aws s3 cp samp.txt s3://..anotherLocation