Do you think it is necessary to add SaveMode for Delete, Update, and Upsert? Such as:
- SaveMode.Delete
- SaveMode.Update
- SaveMode.Upsert
referring to the code: JdbcRelationProvider.scala
I've analyzed its code for SaveTable: JdbcUtils.scala, and think it is easy to extend current insert implementation with delete, update, and merge statements, e.g.:
def getDeleteStatement(table: String, rddSchema: StructType, dialect: JdbcDialect): String = {
val columns = rddSchema.fields.map(x => dialect.quoteIdentifier(x.name) + "=?").mkString(" AND ")
s"DELETE FROM ${table.toUpperCase} WHERE $columns"
}
def getUpdateStatement(table: String, rddSchema: StructType, priKeys: Seq[String], dialect: JdbcDialect): String = {
val fullCols = rddSchema.fields.map(x => dialect.quoteIdentifier(x.name))
val priCols = priKeys.map(dialect.quoteIdentifier(_))
val columns = (fullCols diff priCols).map(_ + "=?").mkString(",")
val cnditns = priCols.map(_ + "=?").mkString(" AND ")
s"UPDATE ${table.toUpperCase} SET $columns WHERE $cnditns"
}
def getMergeStatement(table: String, rddSchema: StructType, priKeys: Seq[String], dialect: JdbcDialect): String = {
val fullCols = rddSchema.fields.map(x => dialect.quoteIdentifier(x.name))
val priCols = priKeys.map(dialect.quoteIdentifier(_))
val nrmCols = fullCols diff priCols
val fullPart = fullCols.map(c => s"${dialect.quoteIdentifier("SRC")}.$c").mkString(",")
val priPart = priCols.map(c => s"${dialect.quoteIdentifier("TGT")}.$c=${dialect.quoteIdentifier("SRC")}.$c").mkString(" AND ")
val nrmPart = nrmCols.map(c => s"$c=${dialect.quoteIdentifier("SRC")}.$c").mkString(",")
val columns = fullCols.mkString(",")
val placeholders = fullCols.map(_ => "?").mkString(",")
s"MERGE INTO ${table.toUpperCase} AS ${dialect.quoteIdentifier("TGT")} " +
s"USING TABLE(VALUES($placeholders)) " +
s"AS ${dialect.quoteIdentifier("SRC")}($columns) " +
s"ON $priPart " +
s"WHEN NOT MATCHED THEN INSERT ($columns) VALUES ($fullPart) " +
s"WHEN MATCHED THEN UPDATE SET $nrmPart"
}
The only extra thing is that we need provide primary keys for them, and they could call the same savePartition function just replacing insertStmt with runningStmt covering all insert/delete/update/merge.
def savePartition(
getConnection: () => Connection,
table: String,
iterator: Iterator[Row],
rddSchema: StructType,
insertStmt: String,
batchSize: Int,
dialect: JdbcDialect,
isolationLevel: Int,
options: JDBCOptions): Iterator[Byte]