0

I want to prettify my results from Gremlin queries by converting them into dataframes.

Gremlin outputs results that (at least to my untrained eyes) look like the Json format. As an example I'll borrow the answer to one of my previous questions that uses the air-routes graph:

g.V().
  group().
    by('code').
    by(
      outE('route').
      order().by('dist').
      inV().
      group().
        by('code').
        by(
          outE('route').
          order().by('dist').
          inV().
          values('code').fold())).
  unfold()

with the results looking something like this:

1.  {'SHH': {'WAA': ['KTS', 'SHH', 'OME'], 'OME': ['TLA', 'WMO', 'KTS', 'GLV', 'ELI', 'TNC', 'WAA', 'WBB', 'SHH', 'SKK', 'KKA', 'UNK', 'SVA', 'OTZ', 'GAM', 'ANC']}}
2.  {'KWN': {'BET': ['WNA', 'KWT', 'ATT', 'KUK', 'TLT', 'EEK', 'WTL', 'KKH', 'KWN', 'KLG', 'MLL', 'KWK', 'PQS', 'CYF', 'KPN', 'NME', 'OOK', 'GNU', 'VAK', 'SCM', 'HPB', 'EMK', 'ANC'], 'EEK': ['KWN', 'BET'], 'TOG': ['KWN']}}
...

How can I convert this into a data frame that looks like this?

Home Stop Dest
==============
SHH  WAA  KTS 
SHH  WAA  SHH 
SHH  WAA  OME
SHH  OME  TLA
SHH  OME  WMO
SHH  OME  KTS
SHH  OME  GLV
SHH  OME  ELI
SHH  OME  TNC
SHH  OME  WAA
SHH  OME  WBB
SHH  OME  SHH
SHH  OME  SKK
SHH  OME  KKA
SHH  OME  UNK
SHH  OME  SVA
SHH  OME  OTZ
SHH  OME  GAM
SHH  OME  ANC
KWN  BET  WNA
KWN  BET  KWT
KWN  BET  ATT
...

I've been able to use a combination of list operations and pandas to achieve this, but is there a more straightforward way?

Note: It would be fine to re-write the query if that makes things easier, as long as the output is similar.

I'm running Gremlin in an Amazon Neptune environment with Neptune Python Utils.

gaspanic
  • 249
  • 1
  • 12
  • The easiest way to get data from Gremlin into a Pandas Data Frame is to have the query return one or more `valueMap` maps, in a list. Pandas can ingest these directly. Can you update the question to show an example query you are using? Also have you seen the Amazon Neptune Jupyter notebook integration via the open source graph-notebook project? – Kelvin Lawrence Feb 03 '22 at 23:59
  • I've added an example, hope this helps. Wasn't aware of the open source graph-notebook project. Thanks, will look into that! – gaspanic Feb 04 '22 at 18:39
  • I will add an example below. – Kelvin Lawrence Feb 04 '22 at 22:41

1 Answers1

2

You can easily import Gremlin maps, or lists of maps into a Pandas Data frame. For example consider the following line of Gremlin Python.

vm = g.V().has('airport','region','GB-ENG').valueMap().by(unfold()).toList()

Using the air-routes data set, the query finds all airports in England.

Having been run, the vm variable will contain a list of maps with values such as:

[{'code': 'LTN', 'type': 'airport', 'desc': 'London Luton Airport', 'country': 'UK', 'longest': 7086, 'city': 'London', 'lon': -0.368333011865616, 'elev': 526, 'icao': 'EGGW', 'region': 'GB-ENG', 'runways': 1, 'lat': 51.874698638916}, {'code': 'SOU', 'type': 'airport', 'desc': 'Southampton Airport', 'country': 'UK', 'longest': 5653, 'city': 'Southampton', 'lon': -1.35679996013641, 'elev': 44, 'icao': 'EGHI', 'region': 'GB-ENG', 'runways': 1, 'lat': 50.9502983093262},

You can then create a Data Frame using code like this

import pandas as pd
from pandas import DataFrame

df = DataFrame.from_dict(vm)
df.sort_values('code',ascending=False).head(10)

and the result will be of this form:

enter image description here

Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38
  • Thanks. I'll give you an upvote, but this doesn't exactly answer my question. I understand that valueMap() outputs can be readily read using pandas, but my question refers to situations where the valuemap doesn't include all my data. In my example, even if I could output the data within the square brackets as a valuemap, the keys listed in curly brackets (the dictionary) would not be included. I'm currently solving this by looping over all dictionary keys, but would like to know if there is a simpler way. – gaspanic Feb 06 '22 at 19:27
  • So are you looking to produce the results you care about in a different form from the Gremlin query or alternative ways to do this in Python? – Kelvin Lawrence Feb 07 '22 at 14:22
  • The Gremlin output is fine. What I want is to be able to further visualize and analyze the data retrieved from the graph. I'm most used to dataframes, hence my focus on Pandas here. But the question might also be taken more general: what is the recommended way of extracting data from Gremlin query outputs? I haven't found any good documentation that shows Gremlin queries beyond the immediate output. Sorry for my possibly quite vague explanations, hope this clarifies things at all. Thanks! – gaspanic Feb 07 '22 at 19:09
  • As I showed in my example, I think the best way is to generate maps of K/V pairs that are easy to consume/insert into other technologies, like Pandas. But again, depending on what you want to do next, the format you use may vary. If you are looking to draw visuals, a `path` result might be better, and easier to import into something like NetworkX. – Kelvin Lawrence Mar 03 '22 at 17:55