-1

I have a variable in scala called a which is as below

scala> a
res17: Array[org.apache.spark.sql.Row] = Array([0_42], [big], [baller], [bitch], [shoe] ..)

It is an array of lists which contains a single word.

I would like to convert it to a single array consisting of sequence of strings like shown below

Array[Seq[String]] = Array(WrappedArray(0_42,big,baller,shoe,?,since,eluid.........

Well the reason why I am trying to create an array of single wrapped array is I want to run word2vec model in spark using MLLIB.

The fit() function in this only takes iterable string.

scala> val model = word2vec.fit(b)
<console>:41: error: inferred type arguments [String] do not conform to method fit's type parameter bounds [S <: Iterable[String]]
vish
  • 67
  • 3
  • 10

2 Answers2

0

The sample data you're listing is not an array of lists, but an array of Rows. An array of a single WrappedArray you're trying to create also doesn't seem to serve any meaningful purpose.

If you want to create an array of all the word strings in your Array[Row] data structure, you can simply use a map like in the following:

val df = Seq(
  ("0_42"), ("big"), ("baller"), ("bitch"), ("shoe"), ("?"), ("since"), ("eliud"), ("win")
).toDF("word")

val a = df.rdd.collect
// a: Array[org.apache.spark.sql.Row] = Array(
//   [0_42], [big], [baller], [bitch], [shoe], [?], [since], [eliud], [win]
// )

import org.apache.spark.sql.Row

val b = a.map{ case Row(w: String) => w }
// b: Array[String] = Array(0_42, big, baller, bitch, shoe, ?, since, eliud, win)

[UPDATE]

If you do want to create an array of a single WrappedArray, here's one approach:

val b = Array( a.map{ case Row(w: String) => w }.toSeq )
// b: Array[Seq[String]] = Array(WrappedArray(
//   0_42, big, baller, bitch, shoe, ?, since, eliud, win
// ))
Leo C
  • 22,006
  • 3
  • 26
  • 39
  • Well the reason why I am trying to create an array of single wrapped array is I want to run word2vec model in spark using MLLIB. The fit() function in this only takes iterable string. I could create val b ,but I cannot run the model using it....I get the following error scala> val model = word2vec.fit(b) :41: error: inferred type arguments [String] do not conform to method fit's type parameter bounds [S <: Iterable[String]] val model = word2vec.fit(b) – vish Dec 29 '17 at 19:24
  • @vish, `a` as in your original sample data is not a `RDD` but an `Array[Row]`. Using my sample data, you'll need to `collect` (i.e. df.rdd.collect) in order to get to the exact `Array[Row]` type, in which `toSeq` is applicable. – Leo C Dec 29 '17 at 20:35
0

I finally got it working by doing the following

val db=a.map{ case Row(word: String) => word }
val model = word2vec.fit( b.map(l=>Seq(l)))
vish
  • 67
  • 3
  • 10