1

I have to copy a input text file (text_file.txt) to a table (table_a). I also need to include the input file's name into the table.

my code is:

\set t_pwd `pwd`
\set input_file '\'':t_pwd'/text_file.txt\'' 

copy table_a
( column1
 ,column2
 ,column3
 ,FileName :input_file
)
from :input_file

The last line does not copy the input text file name in the table. How to copy the input text file's name into the table? (without manually typing the file name)

user3435213
  • 11
  • 1
  • 2

2 Answers2

2

Solution 1
This might not be the perfect solution for your job but i think will do the job :

You can get the table name and store it in a TBL variable and next add this variable at the end of each line in the CSV file that you are about to load into Vertica.
Now depending on your CSV file size this can be quite time and CPU consuming.

export TBL=`ls -1 | grep *.txt` | sed -e 's/$/,'$TBL'/' -i $TBL

Example:

[dbadmin@bih001 ~]$ cat load_data1
1|2|3|4|5|6|7|8|9|10

[dbadmin@bih001 ~]$ export TBL=`ls -1 | grep load*` | sed -e 's/$/|'$TBL'/' -i $TBL

[dbadmin@bih001 ~]$ cat load_data1
1|2|3|4|5|6|7|8|9|10||load_data1

Solution 2
You can use a DEFAULT CONSTRAINT, see example:
1. Create your table with a DEFAULT CONSTRAINT

[dbadmin@bih001 ~]$ vsql
Password:
Welcome to vsql, the Vertica Analytic Database interactive terminal.

Type:  \h or \? for help with vsql commands
       \g or terminate with semicolon to execute query
       \q to quit

dbadmin=> create table TBL (id int ,CSV_FILE_NAME varchar(200) default 'TBL');
CREATE TABLE
dbadmin=> \dt
              List of tables
 Schema | Name | Kind  |  Owner  | Comment
--------+------+-------+---------+---------
 public | TBL  | table | dbadmin |
(1 row)

See the DEFAULT CONSTRAINT it has the 'TBL' default value

dbadmin=> \d TBL
                                       List of Fields by Tables
 Schema | Table |    Column     |     Type     | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+---------------+--------------+------+---------+----------+-------------+-------------
 public | TBL   | id            | int          |    8 |         | f        | f           |
 public | TBL   | CSV_FILE_NAME | varchar(200) |  200 | 'TBL'   | f        | f           |
(2 rows)

2. Now setup your COPY variables
- insert some data and alter the DEFAULT CONSTRAINT value to your current :input_file value.

dbadmin=> \set t_pwd `pwd`
dbadmin=> \set CSV_FILE `ls -1 | grep load*`
dbadmin=> \set input_file '\'':t_pwd'/':CSV_FILE'\''
dbadmin=>
dbadmin=>
dbadmin=> insert into TBL values(1);
 OUTPUT
--------
      1
(1 row)

dbadmin=> select * from TBL;
 id | CSV_FILE_NAME
----+---------------
  1 | TBL
(1 row)

dbadmin=> ALTER TABLE TBL  ALTER COLUMN CSV_FILE_NAME SET DEFAULT :input_file;
ALTER TABLE
dbadmin=> \dt TBL;
              List of tables
 Schema | Name | Kind  |  Owner  | Comment
--------+------+-------+---------+---------
 public | TBL  | table | dbadmin |
(1 row)

dbadmin=> \d TBL;
                                                 List of Fields by Tables
 Schema | Table |    Column     |     Type     | Size |          Default           | Not Null | Primary Key | Foreign Key
--------+-------+---------------+--------------+------+----------------------------+----------+-------------+-------------
 public | TBL   | id            | int          |    8 |                            | f        | f           |
 public | TBL   | CSV_FILE_NAME | varchar(200) |  200 | '/home/dbadmin/load_data1' | f        | f           |
(2 rows)

dbadmin=> insert into TBL values(2);
 OUTPUT
--------
      1
(1 row)

dbadmin=> select * from TBL;
 id |      CSV_FILE_NAME
----+--------------------------
  1 | TBL
  2 | /home/dbadmin/load_data1
(2 rows)

Now you can implement this in your copy script.
Example:

\set t_pwd `pwd`
\set CSV_FILE `ls -1 | grep load*`
\set input_file '\'':t_pwd'/':CSV_FILE'\''
ALTER TABLE TBL  ALTER COLUMN CSV_FILE_NAME SET DEFAULT :input_file;
copy TBL from :input_file DELIMITER '|' DIRECT;

Solution 3

Use the LOAD_STREAMS table
Example:
When loading a table give it a stream name - this way you can identify the file name / stream name:

COPY mytable FROM myfile DELIMITER '|' DIRECT STREAM NAME 'My stream name';

*Here is how you can query your load_streams table :*

=> SELECT stream_name, table_name, load_start, accepted_row_count, 
   rejected_row_count, read_bytes, unsorted_row_count, sorted_row_count,
   sort_complete_percent FROM load_streams;
-[ RECORD 1 ]----------+---------------------------
stream_name            | fact-13
table_name             | fact
load_start             | 2010-12-28 15:07:41.132053
accepted_row_count     | 900
rejected_row_count     | 100
read_bytes             | 11975
input_file_size_bytes  | 0
parse_complete_percent | 0
unsorted_row_count     | 3600
sorted_row_count       | 3600
sort_complete_percent  | 100

Makes sense ? Hope this helped !

Up_One
  • 5,213
  • 3
  • 33
  • 65
1

If you do not need to do it purely from inside vsql, it might possible to cheat a bit, and export the logic outside Vertica, in bash for example:

    FILE=text_file.txt
    (
      while read LINE; do
        echo "$LINE|$FILE"
      done < "$FILE"
    ) | vsql -c 'copy table_a (...) FROM STDIN'

That way you basically COPY FROM STDIN, adding the filename to each line before it even reaches Vertica.

woot
  • 7,406
  • 2
  • 36
  • 55
Guillaume
  • 2,325
  • 2
  • 22
  • 40
  • liked you idea- but just a question - would this be less resource consuming then my first solution ? – Up_One Mar 20 '14 at 20:50
  • Your first solution updates the file on disk, so the file is read once, then written, the re-read again by Vertica. I would thus think that my solution is lighter as the read occurs only once, the rest is small in-memory streamed processing. – Guillaume Mar 21 '14 at 13:22
  • I more prone to spend more CPU when working with Vertica then use my memory to store STDIN- as you very well know Vertica is a memory hog !! kkk btw - this would be a great Discussion – Up_One Mar 21 '14 at 13:33