I would like to calculate the difference between two values from within the same column. Right now I just want the difference between the last value and the first value, however using last(column) returns a null result. Is there a reason last() would not be returning a value? Is there a way to pass the position of the values I want as variables; ex: the 10th and the 1st, or the 7th and the 6th?
Current code
Using Spark 1.4.0 and Scala 2.11.6
myDF =
some dataframe with n rows by m columns
def difference(col: Column): Column = {
last(col)-first(col)
}
def diffCalcs(dataFrame: DataFrame): DataFrame = {
import hiveContext.implicits._
dataFrame.agg(
difference($"Column1"),
difference($"Column2"),
difference($"Column3"),
difference($"Column4")
)
}
When I run diffCalcs(myDF)
it returns a null
result. If I modify difference
to only have first(col)
, it does return the first value for the four columns. However, if I change it to last(col)
, it returns null
. If I call myDF.show()
, I can see that all of columns have Double
values on every row, there are no null
values in any of the columns.