1

I want to read a stream of the parquet files

    val df: DataFrame = spark
      .readStream
      .option("maxFilesPerTrigger", 1)
      .schema(schema)
      .parquet("D:/Programming/Scala/ScalaMaven/parquet1")

but I`m getting this error

java.lang.IllegalArgumentException: Wrong basePath D:/path-to-the-folder-whith-parquet-files/parquet1 for the root path: file:/D:/another-path/parquet/part-00013-f3846a4a-2177-4a24-a7e1-3e20a898b4a6-c000.snappy.parquet

I just saw the interesting thing in the logs

22/02/14 09:37:21 WARN HadoopFSUtils: The directory file:/D:/another-path/parquet/part-00013-f3846a4a-2177-4a24-a7e1-3e20a898b4a6-c000.snappy.parquet was not found. Was it deleted very recently?

Help me I`m stuck

Liubchyk
  • 11
  • 2

1 Answers1

0

Try

val df: DataFrame = spark
      .readStream
      .option("maxFilesPerTrigger", 1)
      .schema(schema)
      .parquet("file:///d:/Programming/Scala/ScalaMaven/parquet1")
Warren Zhu
  • 1,355
  • 11
  • 12
  • Not helped. The same exception appears. I just dont understand what the root path is and how can I change it. There isnt any path like that in code... – Liubchyk Feb 14 '22 at 07:34
  • Sorry, `:` is missed. This should work. You can refer https://stackoverflow.com/questions/27299923/how-to-load-local-file-in-sc-textfile-instead-of-hdfs – Warren Zhu Feb 14 '22 at 15:14