1

I am new to scala. Can we Add/Append data into List or any other Collection Dynamically in scala.

I mean can we add data in List or any collection using foreach (or any other loop).

I am trying to do something like below:

var propertyData = sc.textFile("hdfs://ip:8050/property.conf")

var propertyList = new ListBuffer[(String,String)]()

propertyData.foreach { line => 
      var c = line.split("=") 
      propertyList.append((c(0), c(1)))
    }

And suppose property.conf file contains:

"spark.shuffle.memoryFraction"="0.5"

"spark.yarn.executor.memoryOverhead"="712"

This is compiled fine But value is not added in ListBuffer.

Community
  • 1
  • 1
Darshan
  • 81
  • 2
  • 4
  • 8
  • You could always import any Java class you wish, and use what you understand from that as well – OneCricketeer Sep 19 '16 at 13:14
  • I would avoid trying to add elements to a List within the body of a foreach. I don't know what would happen in Scala but if you tried to do this in Java, you would get an exception because you'd be trying to mutate the collection backing your iterator. I would ask this: what is it that you are trying to do? What's the big picture? Give a real-life example. – Phasmid Sep 19 '16 at 13:41
  • @Phasmid I have add example in question Please check – Darshan Sep 19 '16 at 13:56
  • @Darshan it works for me. – Phasmid Sep 19 '16 at 14:06
  • @Darshan see my answer below. It should work the same in Spark, although I don't have time to try it. You don't need to wrap your config strings in double quotes but that shouldn't matter. Note that you can easily get the config using the Spark config utilities (although I think you only did it that way to show an example). – Phasmid Sep 19 '16 at 14:33
  • 1
    The Problem is spark, not the collection... You loop over a distributed dataset and do a closure on the collection, this does not work – Raphael Roth Sep 19 '16 at 14:52
  • I don't believe this is a duplicate of that other question. If you look carefully, you will see that there is nothing wrong with the code, outside of the context of Spark. It really is a Spark question and possibly a duplicate. But it's not a duplicate of "Add element to a list in Scala" – Phasmid Sep 19 '16 at 15:54

3 Answers3

1

yes thats possible using mutable collections (see this link), example:

  import scala.collection.mutable

  val buffer = mutable.ListBuffer.empty[String]

  // add elements
  buffer += "a string"
  buffer += "another string"

or in a loop:

  val buffer = mutable.ListBuffer.empty[Int]
  for(i <- 1 to 10) {
    buffer += i
  }
Raphael Roth
  • 26,751
  • 15
  • 88
  • 145
1

I tried it using Darshan's code from his (updated) question:

val propertyData = List(""""spark.shuffle.memoryFraction"="0.5"""", """"spark.yarn.executor.memoryOverhead"="712" """)
val propertyList = new ListBuffer[(String,String)]()
propertyData.foreach { line =>
  val c = line.split("=")
  propertyList.append((c(0), c(1)))
}
println(propertyList)

It works as expected: it prints to the console:

ListBuffer(("spark.shuffle.memoryFraction","0.5"), ("spark.yarn.executor.memoryOverhead","712" ))

I didn't do it in a Spark Context, although I will try that in a few minutes. So, I provided the data in a list of Strings (shouldn't make a difference). I also changed the "var" keywords to "val" since none of them needs to be a mutable variable, but of course that makes no difference either. The code works whether they are val or var.

See my comment below. But here is idiomatic Spark/Scala code which does behave exactly as you would expect:

object ListTest extends App {
  val conf = new SparkConf().setAppName("listtest")
  val sc = new SparkContext(conf)
  val propertyData = sc.textFile("listproperty.conf")
  val propertyList = propertyData map { line =>
    val xs: Array[String] = line.split("""\=""")
    (xs(0),xs(1))
  }
  propertyList foreach ( println(_))
}
Phasmid
  • 923
  • 7
  • 19
  • @Darshan I implemented your code in Spark and I agree with you that it doesn't work. I'm not sure exactly why, but you are not using idiomatic Scala and so I am adding to my answer to show how this should be done. It is possible (likely) that your problem arises from not using Arrays (the result of the split method) quite properly in a Spark context. – Phasmid Sep 19 '16 at 14:56
  • @Darshan thanks for the green check mark :) – Phasmid Sep 19 '16 at 15:55
  • Indeed, the Array thing but you get an RDD, but actually hat is fine – thebluephantom Jun 15 '18 at 11:52
0

You can either use a mutable collection (not functional), or return a new collection (functional and more idiomatic) as below :

scala> val a = List(1,2,3)
a: List[Int] = List(1, 2, 3)

scala> val b = a :+ 4
b: List[Int] = List(1, 2, 3, 4)
Kestemont Max
  • 1,302
  • 2
  • 8
  • 10