On Linux, I have files on a remote system that are too big to hold on my local system so I am using ssh to stream the files to the local system and using Vertica's copy stdin to enter the data. The problem is, occasionally the streaming fails for some reason and I have incomplete data committed into Vertica.
For demonstration purposes, I prepared the following:
cat ./test.dat | falseCat |
vsql -d SRVVERTICA -c "copy thisTable (a,b,c)
FROM local stdin
delimiter '|'
trailing nullcols
abort on error commit;"
This batches data from my data file, through my falseCat program that passes its input and always returns an error into Vertica. It is hard to say if this is exactly what I am seeing. Most recently, I got the error from earlier in the pipe:
awk: fatal: print to "standard output" failed (Connection reset by peer)
Please note, this is not a Vertica problem. It is an upstream problem that I am trying to catch in Vertica before it commits. For example, if I receive only 30 million records when I am supposed to receive 50 million, I want to rollback before committing the incomplete data. It also would be "helpful" to know if I got incomplete data, something I don't know now without studying the logs.
I just wish I had room to stream the data into a file locally and load the file into Vertica but I can't because of the size of the data.
Thank you for any input.