I have heard that it is available to call a method of another module in python to bring some calculations that is not implemented in spark and of course it is inefficient to do that. I need a method to compute eigenvector centrality of a graph (as it is not available in graphframes module) . I am aware that there is a way to do that in Scala using sparkling graph, but I need python to include everything. I am newbie to spark RDD and I am wondering what is the wrong with the code below or even if this is a proper way of doing this
import networkx as nx
def func1(dt):
G = nx.Graph()
src = dt.Provider
dest = dt.AttendingPhysician
gr = src.zip(dest)
G = nx.from_edgelist(gr)
deg =nx.eigenvector_centrality(G)
return deg
rdd2=inpatient.rdd.map(lambda x: func1(x))'
rdd2.collect()
inpatient is a dataframe read from a CSV file which I am looking forward to make a graph that is directed from nodes in column Provider to nodes in column AttendingPhysician
there is an error that I am encountered with which is:
AttributeError: 'str' object has no attribute 'zip'