-1

On Linux, I have files on a remote system that are too big to hold on my local system so I am using ssh to stream the files to the local system and using Vertica's copy stdin to enter the data. The problem is, occasionally the streaming fails for some reason and I have incomplete data committed into Vertica.

For demonstration purposes, I prepared the following:

cat ./test.dat | falseCat |
vsql -d SRVVERTICA -c "copy thisTable (a,b,c)
FROM local stdin
delimiter '|'
trailing nullcols
abort on error commit;"

This batches data from my data file, through my falseCat program that passes its input and always returns an error into Vertica. It is hard to say if this is exactly what I am seeing. Most recently, I got the error from earlier in the pipe:

  awk: fatal: print to "standard output" failed (Connection reset by peer)

Please note, this is not a Vertica problem. It is an upstream problem that I am trying to catch in Vertica before it commits. For example, if I receive only 30 million records when I am supposed to receive 50 million, I want to rollback before committing the incomplete data. It also would be "helpful" to know if I got incomplete data, something I don't know now without studying the logs.

I just wish I had room to stream the data into a file locally and load the file into Vertica but I can't because of the size of the data.

Thank you for any input.

user1683793
  • 1,213
  • 13
  • 18
  • I wish if someone felt strongly enough about the question to mark it down, they could bother to comment what they don't like. I thought this was a valid question, off the beaten path, perhaps but still valid. – user1683793 Sep 02 '18 at 21:11
  • What do you mean rollback? Is the copy inserting any line? or just handle the exception and try again? – arkhein Aug 31 '18 at 09:20
  • I want to insert, say, 50 million records but the pipe fails in the middle so Vertica only inserts 30 million. If the pipe fails, I want to roll back so I don't get an incomplete load. – user1683793 Aug 31 '18 at 16:00
  • How I know after the session is finished you can't roll back. Which probably I would do in this case: load in smaller parts, and get the epoch before you start to load. If fall partly you can roll back to the epoch (obviously it can be problematic if multiple processes loading parallel) – arkhein Sep 03 '18 at 11:12

1 Answers1

0

I have something here that works in small cases but I have not tried with big cases, as yet. Basically what I am doing is creating a named pipe and sending the data to the pipe in one process while reading that data into Vertica in another. When the data is done being written, I save either "commit" or "rollback" to a file that I will batch into Vertica when it is done. After Vertica completes the copy, I am running sleep for a few seconds to make sure the other process is done before running the finalVert script. I were on the ball, I would cause it to wait until the file exists but this is a start.

set -o pipefail
[[ -p dataToVert ]] || mkfifo dataToVert
(
if cat ./test.dat | falseCat > dataToVert
then
    echo "commit;" > finalVert.sql
else
    echo "rollback;" > finalVert.sql"
fi
) &
vsql -d SRVVERTICA -At << EOF
copy thisTable (a,b,c)
FROM local 'dataToVert'
delimiter '|'
trailing nullcols
abort on error no commit;
\o | xargs sleep
select 5;
\o
\i finalVert.sql
\q
EOF

I am transitioning to Vertica from Oracle. If somebody has a better way of doing this, I would love to hear it.

I should add that for testing purposes, I have been using a script called falseCat:

#!/bin/ksh
cat -
exit 141

This echoes the stdin to stdout and produces the same 141 error code I am getting from ssh.

user1683793
  • 1,213
  • 13
  • 18