Reading a csv file using below code
rdd= (sc.sparkContext
.textFile("./test123.txt")
.flatMap(lambda line: line.split("\n\r"))
.map(lambda x: x.split("|"))
)
On running the above code spark just creates one partition(on my local machine) , wanted to understand why is that ? The below display 1
rdd.getNumPartitions()
I want to parallelize this operation so can be run on the cluster. For the work to be distributed the rdd should have more then one partition (that's my understanding ) so that task can be sent to other nodes. Any light on this ?