Pyspark equivalent of Scala Spark

Asked Oct 28 '22 at 14:32

Active Oct 28 '22 at 14:32

Viewed 65 times

I have the following code in Scala:

  val checkedValues = inputDf.rdd.map(row => {
   val size = row.length
   val items = for (i <- 0 until size) yield {
     val fieldName = row.schema.fieldNames(i)
     val sourceField = sourceFields(fieldName) // sourceField is a map which returns another object
     val value = Option(row.get(i))
     sourceField.checkType(value)
  }
  items
})

Basically, the above snippet takes a Spark DataFrame, converts into an rdd and applies the map function to return an rdd which is just an collection of object with datatype and other information for each of the values in the DataFrame.

How would I go about writing something equivalent in Pyspark because schema is not an attribute of Row in Pyspark among other things?

asked Oct 28 '22 at 14:32

Tarique

Pyspark equivalent of Scala Spark

0 Answers0