3

Suppose that I have created the following graph. My question is how can I visualize it?

 # Create a Vertex DataFrame with unique ID column "id"
    v = sqlContext.createDataFrame([
      ("a", "Alice", 34),
      ("b", "Bob", 36),
      ("c", "Charlie", 30),
    ], ["id", "name", "age"])
    # Create an Edge DataFrame with "src" and "dst" columns
    e = sqlContext.createDataFrame([
      ("a", "b", "friend"),
      ("b", "c", "follow"),
      ("c", "b", "follow"),
    ], ["src", "dst", "relationship"])
    # Create a GraphFrame
    from graphframes import *
    g = GraphFrame(v, e)
Alex
  • 573
  • 1
  • 10
  • 23
  • Alex : are you asking about visualization tools, techniques etc... to visualize graph frames? then you can see notebook interfaces. see [On-Time Flight Performance with GraphFrames for Apache Spark](https://databricks.com/blog/2016/03/16/on-time-flight-performance-with-graphframes-for-apache-spark.html) – Ram Ghadiyaram Aug 16 '17 at 19:35
  • @RamGhadiyaram I want to visualize the graph preferably using a spark/python library. If there is no such thing using a tool – Alex Aug 17 '17 at 22:12

2 Answers2

5

Using Python/PySpark/Jupyter I am using the draw functionality from the networkx library. The trick is to create a networkx graph from the grapheframe graph

import networkx as nx
from graphframes import GraphFrame

def PlotGraph(edge_list):
    Gplot=nx.Graph()
    for row in edge_list.select('src','dst').take(1000):
        Gplot.add_edge(row['src'],row['dst'])

    plt.subplot(121)
    nx.draw(Gplot, with_labels=True, font_weight='bold')


spark = SparkSession \
    .builder \
    .appName("PlotAPp") \
    .getOrCreate()

sqlContext = SQLContext(spark)

vertices = sqlContext.createDataFrame([
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
  ("d", "David", 29),
  ("e", "Esther", 32),
("e1", "Esther2", 32),
  ("f", "Fanny", 36),
  ("g", "Gabby", 60),
    ("h", "Mark", 61),
    ("i", "Gunter", 62),
    ("j", "Marit", 63)], ["id", "name", "age"])

edges = sqlContext.createDataFrame([
  ("a", "b", "friend"),
  ("b", "a", "follow"),
  ("c", "a", "follow"),
  ("c", "f", "follow"),
  ("g", "h", "follow"),
  ("h", "i", "friend"),
  ("h", "j", "friend"),
  ("j", "h", "friend"),
    ("e", "e1", "friend")
], ["src", "dst", "relationship"])

g = GraphFrame(vertices, edges)
PlotGraph(g.edges)

plot of some graph

Alex Ortner
  • 1,097
  • 8
  • 24
1

I couldn't find any native GraphFrame library that visualizes data either.

Nevertheless, you could try to do it from DataBricks with the display() function. You can see an example here.

Also, you can try to transform the GraphFrame to python lists and use the matplotlib or the Pygraphviz libraries.

drkostas
  • 517
  • 1
  • 7
  • 29