I am running a spark job in Google dataproc cluster version 1.4 and spark version 2.4.5 which reads a file with regular expression in the path from GS bucket and getting below error.
Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: gs://<gs_path>/<file_name>_\d*.dat;
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:552)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:545)
I am able to run the same job in dataproc 1.2 cluster with spark version 2.2.3 and able to read the file from the path fine.
Are there any changes to the way we should form regular expressions in spark 2.4.5 or if there is any changes in the google api of dataproc 1.4 cluster which requires a change in the way I create these paths with regular expressions.