The closest mechanism in Apache Spark to what you're trying to do is accumulators. You can accumulate the log lines on the executors and access the result in driver:
// create a collection accumulator using the spark context:
val logLines: CollectionAccumulator[String] = sc.collectionAccumulator("log")
// log function adds a line to accumulator
def log(msg: String): Unit = logLines.add(msg)
// driver-side function can print the log using accumulator's *value*
def writeLog() {
import scala.collection.JavaConverters._
println("start write")
logLines.value.asScala.foreach(println)
println("end write")
}
val CleanText = udf((s: Int) => {
log(s"I am in this UDF, got: $s")
s+2
})
// use UDF in some transformation:
Seq(1, 2).toDF("a").select(CleanText($"a")).show()
writeLog()
// prints:
// start write
// I am in this UDF, got: 1
// I am in this UDF, got: 2
// end write
BUT: this isn't really recommended, especially not for logging purposes. If you log on every record, this accumulator would eventually crash your driver on OutOfMemoryError
or just slow you down horribly.
Since you're using Databricks, I would check what options they support for log aggregation, or simply use the Spark UI to view the executor logs.