I am trying to capture the number of rows inserted/updated for a table in databricks
Please see the below screenshot for more details. I need to pull the number 31 from num_affected_rows
I am trying to capture the number of rows inserted/updated for a table in databricks
Please see the below screenshot for more details. I need to pull the number 31 from num_affected_rows
It depends on how exactly you do insert/update.
For Spark SQL operations done using spark.sql
, it always returns a DataFrame. For DML it is a single record with operation metrics, exactly what you have displayed on screen. You can read this result.
val df = spark.sql("INSERT INTO foo VALUES (1), (2), (3)")
println(df.select($"num_affected_rows").as[Long].head) // Outputs "3"
For operations done through DataFrame API or DeltaTable API it's a bit more difficult. For Delta tables you can retrieve history like this:
import io.delta.tables._
Seq(1, 2, 3, 4).toDF("x").write.insertInto("foo")
val table = DeltaTable.forName(spark, "foo")
val lastHistoryEntry = table.history(1)
println(lastHistoryEntry.select($"operationMetrics.numOutputRows").as[String].head) // Outputs "4"
Note that different DML operations yield different metrics (see docs). Also beware that at the moment you're checking it, your operation might no longer be the last one! (if still running in the same session you might filter operation list by clusterId
)