That's hard to predict without testing, because processing a 10GB dataset will likely require more than just 10GB of usable cluster memory, namely due to overhead. It also depends on how you're processing it, but if it's just a join it's less complex to estimate.
In any case, the cluster that you described doesn't have enough RAM for the dataset you mentioned, so that's already a warning sign that you'll need to allow Spark to spillover to disk to avoid OOM errors (and take the performance hit that comes with disk I/O).
An incremental way to approach this problem would be generate some sample datasets - e.g. 3 datasets containing 10%, 20% and 50% of the whole dataset - and process them individually on a large cluster to measure the resources each iteration uses. By "large cluster", in this case, I mean something with usable RAM = ~150% of the full dataset size.
From there, it's easier to try and extrapolate the resources needed for 100% of the data. Still, the relationship between dataset size and cluster resources isn't linear - hence the need to estimate and test - so you should provision some extra resources to account for edge cases or the fact that this is simply an estimate.
If iterating like this doesn't fit your method, you could simply provision a very large cluster (e.g. RAM > 2x the dataset size) and see how that specific workload runs.
You should probably also test and measure different approaches to joining those datasets, like using RDD, Dataframes + SparkSQL, etc.
Edit: as far as I know, there is no way to reduce this to a simple, repeatable and exact formula because there are simply too many variables that depend purely on your workload and how you're coding it, like what you're doing with the data after the join (write formats), repartitionings, different Spark APIs, shuffles, reducer choices, serialization choices, etc, etc. Like I wrote above, you need to run your code with increasingly larger datasets and analyze how it behaves.
Avoiding OOM errors can be addressed both by adding more hardware as well as by optimizing code; it depends on the situation itself.
As stated on Spark's website:
How much memory you will need will depend on your application. To
determine how much your application uses for a certain dataset size,
load part of your dataset in a Spark RDD and use the Storage tab of
Spark’s monitoring UI (http://:4040) to see its size in
memory. Note that memory usage is greatly affected by storage level
and serialization format – see the tuning guide for tips on how to
reduce it.