0

I'm trying to create a new dataset by taking intervals from another dataset, for example, consider dataset1 as input and dataset2 as output:

dataset1 = [1, 2, 3, 4, 5, 6]
dataset2 = [1, 2, 2, 3, 3, 4, 4, 5, 5, 6]

I managed to do that using arrays, but for mlib a dataset is needed.

My code with array:

def generateSeries(values: Array[Double], n: Int): Seq[Array[Float]] = {
    var res: Array[Array[Float]] = new Array[Array[Float]](m)
    for(i <- 0 to m-n){
        res :+ values(i to i + n)
    }
    return res
}

FlatMap seems like the way to go, but how a function can search for the next value in the dataset?

Pedro Henrique
  • 601
  • 6
  • 17

1 Answers1

0

The problem here is that an array is in no way similar to a DataSet. A DataSet is unordered and has no indices, so thinking in terms of arrays won't help you. Go for a Seq and treat it without using indices and positions at all.

So, to represent an array-like behaviour on a DataSet you need to create your own indices. This is simply done by pairing the value with the position in the "abstract array" we are representing.

So the type of your DataSet will be something like [(Int,Int)], where the first is the index and the second is the value. They will arrive unordered, so you will need to rework your logic in a more functional way. It's not really clear what you're trying to achieve but I hope I gave you an hint. Otherwise explain better the expected result in the comment to my answer and I will edit.

Chobeat
  • 3,445
  • 6
  • 41
  • 59
  • thanks for answering and sorry for bad explanation. Wat I'm trying to achieve is a dataset made with every n sized interval of the original dataset. – Pedro Henrique Sep 28 '16 at 12:09