29

I have a simple line:

line = "Hello, world"

I would like to convert it to an RDD with only one element. I have tried

sc.parallelize(line)

But it get:

sc.parallelize(line).collect()
['H', 'e', 'l', 'l', 'o', ',', ' ', 'w', 'o', 'r', 'l', 'd']

Any ideas?

gsamaras
  • 71,951
  • 46
  • 188
  • 305
poiuytrez
  • 21,330
  • 35
  • 113
  • 172

3 Answers3

33

try using List as parameter:

sc.parallelize(List(line)).collect()

it returns

res1: Array[String] = Array(hello,world)
michaeltang
  • 2,850
  • 15
  • 18
1

The below code works fine in Python

sc.parallelize([line]).collect()
['Hello, world']

Here we are passing the parameter "line" as a list.

Ruli
  • 2,592
  • 12
  • 30
  • 40
Dhruv
  • 31
  • 5
0

use the following code :

sc.parallelize(Seq(line))
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
vivek
  • 1