Instead of copying over to HDFS, is it possible to just get an array of objects in a bucket in S3 to be processed in EMR?
I've tried this and I keep on either getting security warnings for not having credentials (even after I add them to the configs) (this is from just doing new Path("s3n://...")) or running the jar tells me I am missing the AWS sdk when I try to use the AWS sdk to access my bucket.