0

I am trying to implement chunking logic based on a various file sizes from 0MB to 15MB. I have a byte array of the file but I am trying to chunk the array into chunks that are less than 5MB.

For example if I have a file that is 10.6MB (1.06e+7 bytes) I want to divide this into separate byte arrays that add up 1.06e+7 bytes. This should be able to handle any file size <15MB.

var chunkSize: Int = 5242880
    for(index <- 0 to byteArraySize by chunkSize) {

          if (index == 0){
            tempArray = byteArray.slice(index, chunkSize)
          } else{
            tempArray = byteArray.slice(index+1, (index + chunkSize))
          }
// upload tempArray to DB
          segmentIndex = segmentIndex + 1

        }

The issue I'm having with this is that the last chunk is not a proper size. It should be what is left over in the byte array after it has been chunked into 5242880 byte arrays.

Farhan Islam
  • 609
  • 2
  • 7
  • 21

1 Answers1

2

Since the grouped-method returns a lazy iterator, and thus probably doesn't waste any memory, I don't see any reason not to use it:

for (chunk <- byteArray.grouped(chunkSize)) {
  // do sth. with `chunk`
}

Here is how you could do it without using built-in methods:

def forEachChunk[A](arr: Array[A], chunkSize: Int)(f: Array[A] => Unit): Unit = {
  for (i <- 0 to arr.size by chunkSize) {
    f(arr.slice(i, (i + chunkSize) min arr.size))
  }
}

example:

forEachChunk((0 to 10).toArray, 3){ chunk => 
  println(chunk.toList)
}

prints:

List(0, 1, 2)
List(3, 4, 5)
List(6, 7, 8)
List(9, 10)
Andrey Tyukin
  • 43,673
  • 4
  • 57
  • 93