I am using Graphframe LPA to find the communities but somehow it's not giving me expected result
graph_data = spark.createDataFrame([
("a", "d", "friend"),
("b", "d", "friend"),
("c", "d", "friend")
], ["src", "dst", "relationship"])
here my requirement is to get single community id for all vertices a,b,c and d but i am getting two different community id one for a,b,c and one for d code:
df1 = graph_data.selectExpr('src AS id')
df2 = graph_data.selectExpr('dst AS id')
vertices = df1.union(df2)
vertices = vertices.distinct()
edges = graph_data
g = GraphFrame(vertices, edges)
communities = g.labelPropagation(maxIter=5)