0

Given a string

val s = "My-Spark-App"

How can vertices be created in the following way with Spark?

"My-", "y-S", "-Sp", "Spa", "par", "ark", "rk-", "k-A", "-Ap", "App"

Can that problem be parallelized?

Al Jenssen
  • 655
  • 3
  • 9
  • 25

1 Answers1

3

It is just a matter of a simple sliding over a string:

val n: Int = 3

val vertices: Seq[(VertexId, String)] =  s.sliding(n)
  .zipWithIndex
  .map{case (s, i) => (i.toLong, s)}
  .toSeq

sc.parallelize(vertices)

Can that problem be parallelized?

Yes it can, but if it is a single string it most likely doesn't make sense. Still, if you want:

import org.apache.spark.rdd.RDD

val vertices: RDD[(VertexId, String)] = sc.parallelize(s)
  .sliding(n)
  .zipWithIndex
  .map{case (cs, i) => (i, cs.mkString)}
zero323
  • 322,348
  • 103
  • 959
  • 935