I have the following code in Scala:
val checkedValues = inputDf.rdd.map(row => {
val size = row.length
val items = for (i <- 0 until size) yield {
val fieldName = row.schema.fieldNames(i)
val sourceField = sourceFields(fieldName) // sourceField is a map which returns another object
val value = Option(row.get(i))
sourceField.checkType(value)
}
items
})
Basically, the above snippet takes a Spark DataFrame, converts into an rdd and applies the map function to return an rdd which is just an collection of object with datatype and other information for each of the values in the DataFrame.
How would I go about writing something equivalent in Pyspark because schema is not an attribute of Row in Pyspark among other things?