0

When executing my Spark job at aws-emr I got this error when trying to read avro file from s3 bucket: It happen with versions:

  • emr - 5.5.0
  • emr - 5.9.0

This is the code:

val files  = 0 until numOfDaysToFetch map { i =>
  s"s3n://bravos/clicks/${fromDate.minusDays(i)}/*"
}
spark.read.format("com.databricks.spark.avro").load(files: _*)

The exception:

java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: 1037330823653531755-2017-10-16T03:06:00.avro
    at org.apache.hadoop.fs.Path.initialize(Path.java:205)
    at org.apache.hadoop.fs.Path.<init>(Path.java:171)
    at org.apache.hadoop.fs.Path.<init>(Path.java:93)
    at org.apache.hadoop.fs.Globber.glob(Globber.java:241)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1732)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1713)
    at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.globStatus(EmrFileSystem.java:362)
    at org.apache.spark.deploy.SparkHadoopUtil.globPath(SparkHadoopUtil.scala:237)
    at org.apache.spark.deploy.SparkHadoopUtil.globPathIfNecessary(SparkHadoopUtil.scala:243)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:374)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at scala.collection.immutable.List.flatMap(List.scala:344)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)

`

LeonBam
  • 145
  • 1
  • 12

2 Answers2

0

Path doesn't support colons. It's interpreting 1037330823653531755-2017-10-16T03: as a URI schema and then getting unhappy about any fillowing "/"..even if it got that far it would then fail on "no filesystem for schema "1037330823653531755-2017-10-16T03"

Fix: don't use ":" in filenames.

stevel
  • 12,567
  • 1
  • 39
  • 50
0

I removed the last * from /* and it just worked

LeonBam
  • 145
  • 1
  • 12