-1

I need to do the following upsert in Hive table

  1. if the column with patientnumber exists and if it is same as the casenumber column then update the record as it is else insert new row.
  2. if patientnumber does not exist - insert the data as it is.

I need to do the following upsert in Hive table

  1. if the column with patientnumber exists and if it is same as the casenumber column then update the record as it is else insert new row.
  2. if patientnumber does not exist - insert the data as it is
Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420

1 Answers1

0

Looks like a perfect case for MERGE INTO SQL statement but that does not seem to be supported by Hive tables in PySpark SQL.

Any reason not to migrate to modern table formats (like Delta Lake or Apache Iceberg)?

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420