7

Suppose mt table name is Table_1

Structure of the table is something like:

EMP_id, Emp_Name, File_Name_Input_Date (columns)

I am loading Table_1 from a CSV file. The name of this CSV file is daily_01122014.csv

I am using vertical copy command to load this table:

ex copy Table_1 from '/location/daily_01122014.csv'delimiter ',';

It will load the table, but column name File_Name_Input_Date will be left blank, and I have to hard code it.

What I want is to write a shell script which will dynamically read file name and load it into column File_Name, and also extract date from file name.

i.e: If file name is daily_01122014.csv, then while loading Table_1 column File_Name should be daily_01122014.csv and Input_Date should be 2014-12-01. Since I have 50-60 files, I can not do it manually.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Hdk
  • 197
  • 1
  • 1
  • 8

2 Answers2

2

What you can do is adding this string to the file before you load the data to Vertica See here an example:

[dbadmin@DCXLG001 tmp]$ cat file001.csv
1,Adrian
[dbadmin@DCXLG001 tmp]$ ls -1 file*.csv|xargs -I% sed -i 's/$/,%/' %
[dbadmin@DCXLG001 tmp]$ cat file001.csv
1,Adrian,file001.csv

- where i have a file called file*.csv and i add the name of the file to the end of each line in the actual file. You can do this to any more then one file. I would use the operation system resources to do this transformation then trying to create a filler using Vetica's copy command

Up_One
  • 5,213
  • 3
  • 33
  • 65
2

You can just use CURRENT_LOAD_SOURCE() inside your COPY:

COPY t (c1, c2, c3 AS CURRENT_LOAD_SOURCE()) 
FROM '/home/load_file_1' ON exampledb_node02, '/home/load_file_2' ON exampledb_node03 
DELIMITER ',';

Alternatively, you can use a shell script:

#!/bin/bash
for INPUT_FILE in /path/to/file*.dat; do
vsql -c "copy test_file_name_insert
(
input_file_name as '$INPUT_FILE',
id,
name
)
from '/home/test_input_file.txt'
delimiter '|' ;"
done
Kermit
  • 33,827
  • 13
  • 85
  • 121
  • Sounds like he also wants to parse out the date. Maybe something like `TO_DATE(SPLIT_PART(SPLIT_PART(CURRENT_LOAD_SOURCE(),'_',2),'.',1),'DDMMYYYY'))` would work. Not sure, he references one column in part of his question and then two later. – woot May 21 '15 at 12:11
  • @woot I think you're out of line sir. – Kermit May 21 '15 at 12:23
  • It's not easy being green, is it, @Kermit ? – woot May 21 '15 at 13:07
  • @woot not if you're shopping on woot all day – Kermit May 21 '15 at 19:56