1

Note: This is NOT a duplicate of Can't read data in Presto - can in Hive


In an attempt to make my PySpark application (which uses boto3) work, I had to do following multiple times

  • re-install pip
  • re-install aws-sdk (boto3, botocore, aws-cli)

While I managed to make my application work, I ended up breaking the communication between Presto and S3, so that Presto can no longer read data from Hive EXTERNAL tables stored on S3 (while Hive can)


Upon running a simple query like SELECT COUNT(*) FROM my_db.my_table in Presto, the /var/log/presto/server.log file reports following stacktrace

2018-12-04T12:29:54.433+0530    WARN    hive-hive-63    com.facebook.presto.hive.util.ResumableTasks    ResumableTask completed exceptionally
java.lang.NoClassDefFoundError: Could not initialize class com.amazon.ws.emr.hadoop.fs.util.EmrFsUtils
    at com.amazon.ws.emr.hadoop.fs.s3n.S3Credentials.initialize(S3Credentials.java:45)
    at com.amazon.ws.emr.hadoop.fs.HadoopConfigurationAWSCredentialsProvider.<init>(HadoopConfigurationAWSCredentialsProvider.java:26)
    at com.amazon.ws.emr.hadoop.fs.guice.DefaultAWSCredentialsProviderFactory.getAwsCredentialsProviderChain(DefaultAWSCredentialsProviderFactory.java:44)
    at com.amazon.ws.emr.hadoop.fs.guice.DefaultAWSCredentialsProviderFactory.getAwsCredentialsProvider(DefaultAWSCredentialsProviderFactory.java:28)
    at com.amazon.ws.emr.hadoop.fs.guice.EmrFSProdModule.getAwsCredentialsProvider(EmrFSProdModule.java:65)
    ...

see complete stacktrace here


I'd like to clarify that

  • Only Presto seems to be affected; Hive, aws-cli, Spark etc. are able to read data as usual
  • My EC2 instances have an attached IAM Role that permits reading data from all S3 buckets in my account (and writing to some specific buckets)
  • Earlier Presto had no complaints in reading from S3, the problem arose only after fiddling with environment
  • Things run smoothly if I set location of my Hive external table to HDFS

I've been through some related links to no avail


Environment / Frameworks

y2k-shubham
  • 10,183
  • 11
  • 55
  • 131
  • 1
    The log shows a problem with loading EMRFS. Try disabling it (https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-presto-considerations.html) and see if it helps. – Piotr Findeisen Dec 04 '18 at 10:44
  • For now, a simple `sudo restart presto-server` seems to have fixed the issue (the mysterious world of *BigData*..) – y2k-shubham Dec 04 '18 at 10:49

0 Answers0