2

I'm trying to linearly interpolate an Array[Option[Long]]. For example given:

val example1 = Array(Some(20l), None, Some(60l))
val example2 = Array(Some(20l), None, None, Some(80l))
val example3 = Array(Some(20l), None, None, Some(80l), Some(90l), Some(100l))
val example4 = Array(Some(20l), None, None, Some(80l), None, Some(82l))

I'm expecting:

val example1Interpolated = Array(20l, 40l, 60l)
val example2Interpolated = Array(20l, 40l, 60l, 80l)
val example3Interpolated = Array(20l, 40l, 60l, 80l, 90l, 100l)
val example4Interpolated = Array(20l, 40l, 60l, 80l, 81l, 82l)

There's no relationship between the elements in the collection (e.g. example4). However the values are monotonically increasing.

For those familiar with Python I'm looking for the Scala equivalent of the following:

def interpolate(input_):
    nans = np.isnan(input_)
    get_index = lambda z: z.nonzero()[0]
    input_[nans] = np.interp(get_index(nans), get_index(~nans), input_[~nans])
    return input_

Which for:

interpolate(np.array([20, np.nan, 60]))
interpolate(np.array([20, np.nan, np.nan, 80]))
interpolate(np.array([20, np.nan, np.nan, 80, np.nan, 82]))

yields:

array([ 20.,  40.,  60.])
array([ 20.,  40.,  60.,  80.])
array([ 20.,  40.,  60.,  80.,  81.,  82.])
Amir Ziai
  • 148
  • 1
  • 6
  • 2
    What have you tried so far? Where are you blocked? What's your concrete problem? – sjrd Apr 26 '17 at 20:23
  • @sjrd ideally looking for a functional way to do this given any collection with optional values. Wondering if there's an implementation in some package that someone knows about or something else that I'm missing. – Amir Ziai Apr 26 '17 at 20:27
  • 3
    @AmirZiai You should add the exact requirements you need in your question. – Yuval Itzchakov Apr 26 '17 at 20:36
  • Are all intermediate steps the same size? Could you have `Array(Some(2L),None,Some(4L),None,Some(8L))` resulting in `Array(2L,3L,4L,6L,8L)`? – jwvh Apr 26 '17 at 20:37
  • Apache Spark probably has various linear regression analysis features. – ashawley Apr 26 '17 at 20:41
  • @jwvh not the same size. Yes your example is a possibility. – Amir Ziai Apr 26 '17 at 20:44
  • @ashawley are you suggesting to find each sub-collection with missing values, run a linear regression, and then to populate the values with that regression model? – Amir Ziai Apr 26 '17 at 21:08
  • Yes, use some numeric library. I see you are used to numPy, so I'm not far off. – ashawley Apr 26 '17 at 22:41

2 Answers2

3

This function will work even if there are leading or trailing None, as long as there is at least one element in the list which is Some(_). It's also generic across Integral types. (You could make it generic across Fractional types if you wanted.)

def interpolate[T](list: Iterable[Option[T]])(implicit num: Integral[T]) = {
  import num._
  val prevs = list.zipWithIndex.scanLeft(Option.empty[(T, Int)]) {
    case (prev, (cur, i)) => cur.map((_, i)).orElse(prev)
  }
  val nexts = list.zipWithIndex.scanRight(Option.empty[(T, Int)]) {
    case ((cur, i), next) => cur.map((_, i)).orElse(next)
  }
  prevs.tail.zip(nexts).zipWithIndex.map {
    case ((Some((prev, i)), Some((next, j))), k) =>
      if (i == j) prev else prev + (next - prev) * fromInt(k - i) / fromInt(j - i)
    case ((Some((prev, _)), _), _) => prev
    case ((_, Some((next, _))), _) => next
  }
}

This builds up prevs, which keeps track of the most recent Some(_) and its index to the left, and nexts, which is the same to the right. Then, iterating over both prevs and nexts in parallel, it produces interpolated values based on the left, right, and indices. If the left or right is missing, just fill it in from the other side.

ephemient
  • 198,619
  • 38
  • 280
  • 391
2

I am not familiar with numpy but I think this handles all of your demonstrated cases. It assumes that the first and last element in the list will be defined (if that is not the case, you would have to re-work the fillNones function).

def interpolate(list: List[Option[Long]]) = {

  // Creates a new list that will be used to replace a sequence of Nones
  def fillNones(noneCount: Int, min: Long, max: Long): List[Long] = {
    val stepSize = (max - min) / (noneCount + 1)
    (1 to noneCount).toList.map(i => i * stepSize + min)
  }

  // We will recursively traverse the list
  def recursive(done: List[Long], todo: List[Option[Long]]): List[Long] = {
    todo match {

      // If todo is empty, we are done
      case Nil => done

      // If the head of todo is Some(Long), then add it to the list of things that are done and move on
      case Some(l) :: tail => recursive(done :+ l, tail)

      // If the head wasn't Some(Long), then we have to figure out how many Nones are in a row, and replace them
      case todo =>

        // Find out how many Nones are in a row
        val noneCount = todo.takeWhile(_.isEmpty).length

        // Split the todo so we can get what is remaining
        val remaining = todo.splitAt(noneCount)._2

        // Create a new list to replace the sequence of Nones
        val filled = fillNones(noneCount, done.last, remaining.head.get)

        // Add our new filled list to done, and continue on
        recursive(done ++ filled, remaining)
    }
  }

  recursive(List.empty, list)
}

Testing:

val example1 = List(Some(20l), None, Some(60l))
println(interpolate(example1))
// Prints: List(20, 40, 60)    

val example2 = List(Some(20l), None, None, Some(80l))
println(interpolate(example2))
// Prints: List(20, 40, 60, 80)

val example3 = List(Some(20l), None, None, Some(80l), Some(90l), Some(100l))
println(interpolate(example3))
// Prints: List(20, 40, 60, 80, 90, 100)

val example4 = List(Some(20l), None, None, Some(80l), None, Some(82l))
println(interpolate(example4))
// Prints: List(20, 40, 60, 80, 81, 82)
Tyler
  • 17,669
  • 10
  • 51
  • 89