0

I have the documentation example of mapvalues

x = sc.parallelize([("a", ["apple", "banana", "lemon"]), ("b", ["grapes"])])
def f(x): return len(x)
x.mapValues(f).collect()
[('a', 3), ('b', 1)]

My question is where does this mapvalues happen? Is it in a python process started in the off heap defined by spark.executor.memoryOverhead (or spark.executor.pyspark.memory depending upon if the pyspark.memory is defined) or is pyspark able to convert that function to corresponding java that would run in the on-heap in jvm?

figs_and_nuts
  • 4,870
  • 2
  • 31
  • 56

0 Answers0