The conclusion of my analysis and the optimizations I have used:
Parameters of Spark used in spark-submit: 90% used of available YARN memory in the cluster. I used 3 vcores/executor and 3 executors per physical server. I run it with KryoSerializer, to reduce the space of the data stored.
Graph - RDDs of nodes and edges: previously, I created and stored the RDDs of nodes and edges in HDFS, in 1000 files by using coalesce, so that the data is saved uniformly, although it lasts a long time.
Graph - Loading: from existing RDD files in HDFS.
Graph - Nodes and Edges: loaded correctly. Their attributes in scala are only the ones I use (5 attributes each) and with the minimal use of memory (Integers), with no auxiliary attributes.
Graph - Messages merged in the mergeMsg method: I combine the 2 messages, by using my own formula (related to the target of my project).
Graph - vprog: the nodes gather all the information received in the messages and save it in a List of "known good info" inside each node. That info is used to create the
Graph - Messages sent in sendMsg method: each node uses its info (List of known good info and its other attributes) to create the messages to be sent. Also, to reduce the number of messages, I filtered the messages that were not useful, in order not to send them in the Iterator.
I discovered MY MAIN PROBLEM: the List inside each node that saves the "known good info" is immutable.
SOLUTION: I should use ListBuffer (mutable). Also, I should use the method .append() instead of .++(), because this one creates a new instance of the List.
More info of collections in scala: http://docs.scala-lang.org/overviews/collections/performance-characteristics
The performance now is more than 10 times faster and the memory errors do not appear now.