0

I am in the early stages of designing an application using Sparc RDDs (which I don't understand yet). The RDD will contain a large number of objects which in turn contain references to a small number (100) of somewhat large (0.5MB) objects which are immutable.

The operations to be mapped over the RDD will call member functions on the objects which in turn call member functions on the references.

Is this possible in principle?

michael meyer
  • 169
  • 2
  • 9

2 Answers2

2

Spark (not sparc) data will normally be passed around using java serialization (unless you configure it to use kyro). I think this will do the right thing with the large objects. If you're willing to customize your data a bit it might be best to use broadcast variables for the large immutable objects.

lmm
  • 17,386
  • 3
  • 26
  • 37
0

I think this goes against the ethos of Sparks as distributed functional programming.

I think you would be better served re-tooling your domain model in terms of the primitives of map, filtering and reducing. Reasoning about the effects of calling these functions seems difficult.

Also, if they are immutable what is the side-effect of calling methods on them?

bearrito
  • 2,217
  • 1
  • 25
  • 36