Do Scala Sparc RDDs allow objects with reference members

Question

I am in the early stages of designing an application using Sparc RDDs (which I don't understand yet). The RDD will contain a large number of objects which in turn contain references to a small number (100) of somewhat large (0.5MB) objects which are immutable.

The operations to be mapped over the RDD will call member functions on the objects which in turn call member functions on the references.

Is this possible in principle?

score 2 · Accepted Answer · answered Nov 04 '14 at 19:25

Spark (not sparc) data will normally be passed around using java serialization (unless you configure it to use kyro). I think this will do the right thing with the large objects. If you're willing to customize your data a bit it might be best to use broadcast variables for the large immutable objects.

score 0 · Answer 2 · answered Nov 04 '14 at 21:42

0

I think this goes against the ethos of Sparks as distributed functional programming.

I think you would be better served re-tooling your domain model in terms of the primitives of map, filtering and reducing. Reasoning about the effects of calling these functions seems difficult.

Also, if they are immutable what is the side-effect of calling methods on them?

answered Nov 04 '14 at 21:42

bearrito

2,217
1
25
36

The methods would read data from the large objects to modify data in the small ones. – michael meyer Nov 05 '14 at 08:10

Do Scala Sparc RDDs allow objects with reference members

2 Answers2