Environment(s):
- Azure blob storage and Local File System
- Scala 2.12.10/Spark 3.0.1
With a file existing at C:\path\to\any\file-with-[brackets].csv,
spark.read.csv("C:\\path\\to\\any\\file-with-[brackets].csv")
results in
org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:/path/to/any/file-with-[brackets].csv;
at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:621)
...
If I remove the brackets both from the file and the path string, no problem:
spark.read.csv("C:\\path\\to\\any\\file-without-brackets.csv")
results in
csv: org.apache.spark.sql.DataFrame = [_c0: string]
(I cannot believe this hasn't been seen before, I find no mention of it on google or stackoverflow.)
How do you reference a file with brackets in the filename in Spark?
UPDATE ! I discovered that brackets are used in the glob syntax of spark's file pathing (you can use wildcards in spark file paths, and [1234] looks for any character 1, 2, 3, or 4 in the position in the file path, like a REGEX)... But I cannot figure out how to ESCAPE a bracket in this context.
How do you escape Spark Wildcard functionality around square brackets when referencing a singular file with literal square brackets in a file path with spark's DataFrameReader?