1

I'm writing a streaming job in Spark to load data into Splice Machine. I've followed the community tutorial using VTI to insert data into Splice, but all the examples perform INSERTs. On the contrary, I should perform UPSERTs of the records. Is there any way to achieve this?

Thank you.

mgaido
  • 2,987
  • 3
  • 17
  • 39

1 Answers1

2

Yes you can do an upsert by changing the VTI statement to use the insertMode hint. Your statement would look something like the following:

INSERT INTO IOT.SENSOR_MESSAGES --splice-properties insertMode=UPSERT select s.* from new com.splicemachine.tutorials.sparkstreaming.kafka.SensorMessageVTI(?) s ( id varchar(20), enter code here location varchar(50), temperature decimal(12,5), humidity decimal(12,5), recordedtime timestamp );

Note that in your java code you would need to add a new line character (\n) after the hint, otherwise it thinks everything that is part of the hint. There was an issue with using hints with VTIs that has been fixed in the next release of splice machine if you are on a 2.0.x version, we can get the fix back ported to that version.

Two other hints that you may find useful when using a VTI are:

  • statusDirectory: Which puts the import / upsert messages in a bad directory on HDFS much like the SYSCS_UTIL.IMPORT_DATA statement does
  • badRecordsAllowed: Which indicates the number of bad records that are allowed before the process fails
Erin
  • 395
  • 3
  • 7
  • thank you for your answer. I am missing only one point: I have version 2.0.1.28, thus am I affected by the bug? If so, do you know if there is a newer release for CDH 5.8.X which is not affected by it?moreover, is there a reference/documentation for the hints available? thank you! – mgaido Jan 03 '17 at 16:10
  • 1
    Yes you are affected by this defect. I am about to check in the fix for your release so it should be available with the next build for 2.0. These hints are not listed in our documentation - I'll add a request to have the VTI documentation page updated with those hints. – Erin Jan 03 '17 at 19:30