If I sort a dataframe in descending ortder based on a column. And then drop the duplicates using df.dropDuplicate then which element will be removed? The element which was smaller based on sort?
Anonymous function work fine.
For following code set up the problem:
import sparkSession.implicits._
val sparkSession = SparkSession.builder.appName("demo").getOrCreate()
val sc = sparkSession.sparkContext
case class DemoRow(keyId: Int, evenOddId:…
I have 2 list
val listA = List("Mary", "Harry", "Marry", "Harry", "Marry")
val listB = List("Mary", "Harry", "Marry", "Harry", "Marry")
Now I want to know whether the index of all occurrences of Harry in both the list are same or not. What is the…
I'm using certain external library that has a method which is overloaded several times with different arguments, something like:
insertInto(index: Int, int: Int)
insertInto(index: Int, lng: Long)
insertInto(index: Int, dbl: Double)
insertInto(index:…
I've read other threads on SO about iterating through collections from config files in Scala but they always assume the type in question is either a ConfigList or an ObjectList. In my case it is a more complex structure and I could not figure out…
I am working with highcharts in scalaJS. I want to create such [ [0, 1],[1, 2],[2, 8] ] JS array in scalaJS (basically 2D array)
What kind of parameters can be passed can be seen in this documentation : HighChart Documentation
Need to override…
I have a constantly-updating mutable.HashMap[String, String] with a record of current user locations:
{user1 -> location1,
user2 -> location4,
user3 -> location4}
I want to keep track of the location co-occurences between users - that is, how…
I need to get the all the columns along with the count.In Scala RDD.
Col1 col2 col3 col4
us A Q1 10
us A Q3 10
us A Q2 20
us B Q4 10
us B Q5 20
uk A Q1 10
uk A Q3 10
uk A …
I have a spark dataframe like below
id|name|age|sub
1 |ravi|21 |[M,J,J,K]
I don't want to explode on the column "sub" as it will create another extra set of rows. I want generate unique values from the "sub" column and assign it to new column…
I have a Spark Dataframe with the below columns.
C1 | C2 | C3| C4
1 | 2 | 3 | S1
2 | 3 | 3 | S2
4 | 5 | 3 | S2
I want to generate another column C5 by taking distinct values from column C4
like …
I want to define a function like this:
def mixUp[A](lista: List[A], aprop: Int, listb: List[A], bprop: Int): List[A]
lista and listb are two List that has the same generic.
And the aprop,bprop meaning the proportion of lista and listb that…
I want to compare data in two RDDs. How can I iterate and compare field data in one RDD with field data in another RDD. below Eg files:`
File1
f1 f2 f3 f4 f5 f6 f7
1 Nancyxyz 23456 12:30 NEWYORK 9000 xyz
2 ranboxys 12345…
In Scala, if we have a MultiMap which maps String to Set[String], for example:
val a = Map(
"Account1" -> Set("Cars", "Trucks"),
"Account2" -> Set("Trucks", "Boats")
)
What's an elegant way of inverting / reversing it to end up with:
Map(
…
I need to create a HashMap of directory-to-file in scala while I list all files in the directory. How can I achieve this in scala?
val directoryToFile = awsClient.listFiles(uploadPath).collect {
case path if !path.endsWith("/") => {
path…