I am trying to use the COPY INTO
statement in Databricks to ingest CSV files from Cloud Storage.
The problem is that the folder name has a space in it /AP Posted/
and when I try to refer to the path the code execution raises the below error:
Error in SQL statement: URISyntaxException: Illegal character in path at index 70: abfss://gpdi-files@hgbsprodgbsflastorage01.dfs.core.windows.net/RAW/AP Posted/
I googled the error and found articles advising to replace the space with "%20". This solution is not effective.
So, does someone knows how to solve it? Or the only solution is indeed to prevent spaces in naming folders.
This is my current Databricks SQL Code:
COPY INTO prod_gbs_gpdi.bronze_data.my_table
FROM 'abfss://gpdi-files@hgbsprodgbsflastorage01.dfs.core.windows.net/RAW/AP Posted/'
FILEFORMAT = CSV
VALIDATE 500 ROWS
PATTERN = 'AP_SAPEX_KPI_001 - Posted Invoices in 2021_3.CSV'
FORMAT_OPTIONS(
'header'='true',
'delimiter'=';',
'skipRows'='8',
'mergeSchema'='true', --Whether to infer the schema across multiple files and to merge the schema of each file
'encoding'='UTF-8',
'enforceSchema'='true', --Whether to forcibly apply the specified or inferred schema to the CSV files
'ignoreLeadingWhiteSpace'='true',
'ignoreTrailingWhiteSpace'='true',
'mode'='PERMISSIVE' --Parser mode around handling malformed records
)
COPY_OPTIONS (
'force' = 'true', --If set to true, idempotency is disabled and files are loaded regardless of whether they’ve been loaded before.
'mergeSchema'= 'true' --If set to true, the schema can be evolved according to the incoming data.
)