Is it possible to get the header of staged csv file on snowflake into an array ?
I need to loop over all fields to insert data into our data vault model and it is really needed to get these column names onto an array.
Is it possible to get the header of staged csv file on snowflake into an array ?
I need to loop over all fields to insert data into our data vault model and it is really needed to get these column names onto an array.
Actually it was solved by using the following query over a staged file in a JavaScript stored procedure:
var get_length_and_columns_array = "select array_size(split($1,',')) as NO_OF_COL, "+
"split($1,',') as COLUMNS_ARRAY from "+FILE_FULL_PATH+" "+
"(file_format=>"+ONE_COLUMN_FORMAT_FILE+") limit 1";
The ONE_COLUMN_FORMAT_FILE
will put all fields into one in order to make this query works:
CREATE FILE FORMAT ONE_COLUMN_FILE_FORMAT
TYPE = 'CSV' COMPRESSION = 'AUTO' FIELD_DELIMITER = '|' RECORD_DELIMITER = '\n'
SKIP_HEADER = 0
FIELD_OPTIONALLY_ENCLOSED_BY = 'NONE'
TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE
ESCAPE = 'NONE' ESCAPE_UNENCLOSED_FIELD = '\134'
DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('\\N');
Yes, you can query the following metadata of your staged files:
So there is not enough information. But: There is the parameter SKIP_HEADER
that can be used in your COPY INTO-command. So my suggestion for a workaround is:
More infos about the parameter within the COPY INTO-Command https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
Currently dynamically generating the column list from a csv file is not currently available in snowflake or most platform afaik.
csv is not the ideal format for this kind of schema on read operation.
if you are able to work with your input files, I would suggest converting the csv to json. If you use json instead, you can then use snowflake to process the file.
here is some context:
Load CSV files with dynamic column headers Not Supported
Example Converting CSV to JSON with Pandas
import pandas as pd
import csvkit
filepath = '/home/username/data/sales.csv'
jsonfilepath = filepath.replace('.csv','.json')
df = pd.read_csv(filepath)
# df.to_json(jsonfilepath, orient="table", date_format="iso", index=False)
df.to_json(jsonfilepath, orient="records", date_format="iso")
print("Input File: {}\r\nOutput File: {}".format(filepath, jsonfilepath))
Example Converting CSV to JSON with csvkit
csvjson -i 4 '/home/username/data/sales.csv' > '/home/username/data/sales.csvkit.json'
Querying Semi-Structured Data in Snowflake
Loading JSON Data into Snowflake
/* Create a target relational table for the JSON data. The table is temporary, meaning it persists only for */
/* the duration of the user session and is not visible to other users. */
create or replace temporary table home_sales (
city string,
zip string,
state string,
type string default 'Residential',
sale_date timestamp_ntz,
price string
);
/* Create a named file format with the file delimiter set as none and the record delimiter set as the new */
/* line character. */
/* */
/* When loading semi-structured data (e.g. JSON), you should set CSV as the file format type (default value). */
/* You could use the JSON file format, but any error in the transformation would stop the COPY operation, */
/* even if you set the ON_ERROR option to continue or skip the file. */
create or replace file format sf_tut_csv_format
field_delimiter = none
record_delimiter = '\\n';
/* Create a temporary internal stage that references the file format object. */
/* Similar to temporary tables, temporary stages are automatically dropped at the end of the session. */
create or replace temporary stage sf_tut_stage
file_format = sf_tut_csv_format;
/* Stage the data file. */
/* */
/* Note that the example PUT statement references the macOS or Linux location of the data file. */
/* If you are using Windows, execute the following statement instead: */
-- PUT file://%TEMP%/sales.json @sf_tut_stage;
put file:///tmp/sales.json @sf_tut_stage;
/* Load the JSON data into the relational table. */
/* */
/* A SELECT query in the COPY statement identifies a numbered set of columns in the data files you are */
/* loading from. Note that all JSON data is stored in a single column ($1). */
copy into home_sales(city, state, zip, sale_date, price)
from (select substr(parse_json($1):location.state_city,4), substr(parse_json($1):location.state_city,1,2),
parse_json($1):location.zip, to_timestamp_ntz(parse_json($1):sale_date), parse_json($1):price
from @sf_tut_stage/sales.json.gz t)
on_error = 'continue';
/* Query the relational table */
select * from home_sales;