I am using the pyspark.ml.fpm (FP Growth) implementation of association rule mining on Spark v2.3.
The spark UI shows that the tasks as the end run very slowly. This seems to be a common problem and might be related to data skew.
Is this the real reason? Is there any solution for this?
I don't want to change the minSupport or minConfidence thresholds because that would effect by results. Removing the columns isn't a solution either.