0

How can I list all file names of parquet files in the S3 directory in Amazon?

I found this way:

val s3 = AmazonS3ClientCuilder.standard.build()
var objs = s3.listObjects("bucketname","directory")
val summaries = objs.getObjectSummaries()
while (objs.isTruncated()) {
  objs = s3.listNextBatchOfObjects(objs)
  summaries.addAll(objs.getObjectSummaries())
}
val listOfFiles = summaries.toArray

But it throws the error:

java.lang.NoSuchMethodError: org.apache.http.conn.ssl.SSLConnectionSocketFactory

I added the dependency for httpclient 4.5.2 as indicated in many answers, but I still get the same error.

Also I did:

    libraryDependencies ++= Seq(
      "org.apache.spark" %% "spark-core" % sparkVersion exclude("commons-httpclient", "commons-httpclient"),
      "org.apache.spark" %% "spark-mllib" % sparkVersion exclude("commons-httpclient", "commons-httpclient"),
      "org.sedis" %% "sedis" % "1.2.2",
      "org.scalactic" %% "scalactic" % "3.0.0",
      "org.scalatest" %% "scalatest" % "3.0.0" % "test",
      "com.github.nscala-time" %% "nscala-time" % "2.14.0",
      "com.amazonaws" % "aws-java-sdk-s3" % "1.11.53",
      "org.apache.httpcomponents" % "httpclient" % "4.5.2",
      "net.java.dev.jets3t" % "jets3t" % "0.9.3",
      "org.apache.hadoop" % "hadoop-aws" % "2.6.0",
      "com.github.scopt" %% "scopt" % "3.3.0"
    )
Markus
  • 3,562
  • 12
  • 48
  • 85

0 Answers0