Using scala-native for in-memory data processing

Question

I'm wondering whether it is possible to leverage scala-native for performing large in-memory jobs.

For instance, imagine you have a spark job that needs 150GB of RAM so you'd have to run 5x30GB executors in a spark cluster since JVM garbage collectors wouldn't catch up with heap bigger than that.

Imagine that 99% of the data being processed are Strings in collections.

Do you think that scala-native would help here? I mean, as an alternative to Spark?

How does it treat String? Does it also have this overhead because jvm treats it as class?

What are the memory ("Heap") GC limits as the classic 30GB in case of JVM? Would I also end up with a limit like 30GB?

Or is this generally a bad idea? To use scala-native for in-memory data processing. My guess is that scala-offheap is better way to go.

score 1 · Answer 1 · answered Nov 07 '16 at 19:59

in-memory data processing is a use case where scala-native will shine compared to Scala on JVM.

SN supports all types of memory allocations. Static allocation (you can define the global variable in C and return a pointer to it with a C function), stack allocation, dynamic allocation based on C malloc/free and garbaged dynamic allocation (Scala new).

For Strings, you can use 8 bits per char C String, Java style 16 bits per char or you can implement you own Small String Optimization as seen in C++, using @struct and pointers.

Of course, you have temporal drawbacks, like SN still a pre 0.1 version and lack of Java library being ported to Scala.

score 0 · Answer 2 · answered Sep 19 '16 at 08:32

0

At this moment it is a bad idea because of Scala Native not ready for production use. Also, Scala Native uses garbage collector(now BOEHM) and will see same problems like with a JVM, but with Scala Native, you can try manual memory management.

answered Sep 19 '16 at 08:32

grinder

436
1
6
10

Using scala-native for in-memory data processing

2 Answers2