5

I've started playing around with gremlin-python wrapper to interact with my gremlin server.

I did the following steps:

./bin/gremlin.sh

Once the Gremlin console opens up, I loaded configurations using:

graph = JanusGraphFactory.open('conf/gremlin-server/janusgraph-cassandra-es.properties')
g = graph.traversal()
saturn = g.V().has('name', 'saturn')

And the above set of codes in gremlin shell works fine, and I can see verteces listed down, but when I try to do same in python I get an empty graph. The following is my code for python:

graph = Graph()
g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','g'))
print(g)

It returns : graphtraversalsource[graph[empty]]

Why am I getting empty graph? As far as I feel, it is unable to connect to same Graph source. Is there somthing I'm missing?

Note that in:

JanusGraphFactory.open('conf/gremlin-server/janusgraph-cassandra-es.properties')

the config filename provided is one used to start gremlin server.

Any help is really appreciated.

Thanks

Debasish Kanhar
  • 1,123
  • 2
  • 15
  • 27

2 Answers2

13

The reason you are seeing graph[empty] is because that's the actual string representation of the Python graph object -- see the code here. The graph may actually contain data though, so it would be better if it was something like graph[remote] or graph[] instead. I've opened up an issue to address this.

Out of the box, JanusGraph isn't configured for Python. You can find docs on how do this in the Apache TinkerPop docs. First install gremlin-python. Here's the command assuming you're using JanusGraph 0.1.1 which uses TinkerPop 3.2.3:

bin/gremlin-server.sh -i org.apache.tinkerpop gremlin-python 3.2.3

Next modify the conf/gremlin-server/gremlin-server.yaml to add the gremlin-python script engine:

scriptEngines: {
  gremlin-groovy: {
    imports: [java.lang.Math],
    staticImports: [java.lang.Math.PI],
    scripts: [scripts/empty-sample.groovy]},
  gremlin-jython: {},
  gremlin-python: {}
}

To use Gremlin Python, you need to go through a Gremlin Server, so start the JanusGraph pre-packaged distribution:

bin/janusgraph.sh start

From the Gremlin Console:

gremlin> graph = JanusGraphFactory.open('conf/janusgraph-cassandra-es.properties')
==>standardjanusgraph[cassandrathrift:[127.0.0.1]]
gremlin> GraphOfTheGodsFactory.load(graph)
==>null
gremlin> g = graph.traversal()
==>graphtraversalsource[standardjanusgraph[cassandrathrift:[127.0.0.1]], standard]
gremlin> g.V().count()
14:51:58 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
==>12

Install the Gremlin-Python driver, again matching on the TinkerPop version:

pip install gremlinpython==3.2.3

From the Python 3 shell:

>>> from gremlin_python import statics
>>> from gremlin_python.structure.graph import Graph
>>> from gremlin_python.process.graph_traversal import __
>>> from gremlin_python.process.strategies import *
>>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
>>> graph = Graph()
>>> g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','g'))
>>> print(graph)
graph[empty]
>>> print(g)
graphtraversalsource[graph[empty]]
>>> g.V().count().next()
12
>>> g.addV('god').property('name', 'mars').property('age', 3500).next()
v[4280]
>>> g.V().count().next()
13

Keep in mind when you are working in the Python shell, the graph traversals are not automatically iterated, so you need to make sure to iterate the traversal with iterate() or next() or toList().

Jason Plurad
  • 6,682
  • 2
  • 18
  • 37
  • Thanks for the detailed steps but I'm still facing an issue. When I do `bin/janusgraph.sh start` Its able to connect to cassandra & es, but timeout on gremlin-server. I went though logs but there was no stacktrace to point out what exactly was error, just that I'm getting time out. I increased the wait time from default 60 to 120 but still same issue. Is that expected? Thanks – Debasish Kanhar Sep 07 '17 at 10:14
  • Going to my comment of connection timeout, just made a discovery. If I add `gremlin-jython: {}, gremlin-python: {}` to `scriptEngines` in `conf/gremlin-server/gremlin-server.yaml` I face timeout error but without that I dont. But without that, I'm still unable to fetch any results Even `g.V().count().next()` throws an error `KeyError: None` – Debasish Kanhar Sep 07 '17 at 11:09
  • I've edited my post above to add a couple more steps to install the `gremlin-python` plugin. If you are getting a timeout on the Gremlin Server, try kill its process then start it again with `bin/gremlin-server.sh` and then share the output in your original question. – Jason Plurad Sep 07 '17 at 13:00
  • I was unable to make it working from `bin/janusgraph.sh start` but got it working by `bin/gremlin-server.sh`. Also the error of nothing getting fetched is solved now after using `Gremlin-Python version 3.2.3`. I was using 3.3.0 prior and maybe version mismatch. But now I have another query, how do you commit changes? I was able to add vertex by doing `g.addV('god').property('name', 'mars').property('age', 3500)` and my result shows my vertex. But how do I commit? I tried `g.addV(label, 'god', 'name', 'mars', 'age', 3000).tx().commit()` and that failed. Do I need to create my own `Traversal()`? – Debasish Kanhar Sep 07 '17 at 15:35
  • And, adding to my previous point, how do we load GraphSON into gremlin-python? I went to http://tinkerpop.apache.org/docs/current/reference/#gremlin-python - > Custom Serialization but couldnt understand it. Sorry for bugging so much, and would add to existing question if required, but any help is grately appreciated. – Debasish Kanhar Sep 07 '17 at 15:58
  • You're asking so many questions here in the Stack Overflow comments section that are no longer related to your original question. You should start a top-level post on the gremlin-users mailing list. – Jason Plurad Sep 07 '17 at 18:00
  • Quick answers, 1. A traversal like `g.addV('god').property('name', 'mars').property('age', 3500).next()` would get auto-committed. You need to make sure to iterate. 2. There is no provided API that can load GraphSON with gremlin-python at the moment. You'd have to parse the document yourself and construct the graph using traversals. – Jason Plurad Sep 07 '17 at 18:03
  • Hi, Thanks for the response. I wasn't doint `next()` and that is why I records werent getting auto updated. As for loading GraphSON, if there is no API provided, is there any way to load data containing 1million rows effectively? Manually iterating over each node, and adding to graph doesn't seem like a optimal way to go forward if I'm right. – Debasish Kanhar Sep 08 '17 at 04:59
  • As for gremlin mailing list, for some reason my questions arent getting approved by MOD! Thanks for patience, hopefully this would be last question!:-) – Debasish Kanhar Sep 08 '17 at 05:06
  • can you help me with another question posted at https://stackoverflow.com/questions/46139453/count-mismatch-using-graph-v-count-and-graph-v-has-count thanks – Debasish Kanhar Sep 10 '17 at 10:03
  • I have published a complete walk through on how to connect to JanusGraph from Python https://medium.com/@BGuigal/janusgraph-python-9e8d6988c36c – Benoit Guigal Jul 20 '18 at 10:09
1

Your local "g" in the Gremlin Console is an embedded instance of a graph. It therefore "contains" something and is not empty. For your "g" in Python, it is "empty" in the sense that on its own there are no vertices/edges that within it - the vertices/edges are in the remote graph on Gremlin Server that it reflects. I assume that if you were to do a g.V().count() in python you would get the same vertex count back as you would if you did the same in java. If not, then there is some other problem, but do not expect a "remote" graph instance to show vertex/edges of any sort (unless a day comes where gremlin-python is written as a Gremlin virtual machine that has it's own Python native graph databases attached to it - in such a case, "g" would be embedded and thus own vertices/edges and would likely no longer print as "empty").

stephen mallette
  • 45,298
  • 5
  • 67
  • 135
  • So do you mean to say that python's grimlin wrapper is unable to fetch the Data/Graph stored on remote server? If that is the case, fetching empty graph seems like not an issue. But if that is case, then how do we fetch the Graph stored on DB, query on it and fetch results using python? – Debasish Kanhar Sep 06 '17 at 10:49
  • 1
    no. it is perfectly capable of getting data from the remote graph. all i'm saying is that it says "empty" because the data is not local. it is analogous to `EmptyGraph.instance()` in Java. you only use it as a reference to a remote graph that actually holds the data. basically, don't be confused by the label "empty" - it bears no significance to the data that is actually available remotely. – stephen mallette Sep 06 '17 at 13:10
  • Correct me if m wrong, so you mean it shows empty because it actually doesnt store any data locally, but rather reference my remote dataset? If that is case then as you suggested, g.V().count() should give me some results? The count of remote object right? But even that throws up empty as [['V'], ['count']] – Debasish Kanhar Sep 06 '17 at 14:01
  • what you have said is correct and `g.V().count()` should return something. The output that you mention as "empty as [['V'], ['count']]" is gremlin bytecode representation of that traversal - doing `x = g.V().count()` the "x" is just the traversal instance and not the result. you need to iterate that traversal in some way. in this case you would want to do, "x = g.V().count().next()" – stephen mallette Sep 06 '17 at 14:15
  • 1
    So, I did `g.V().count().next()`, and now its throwing an exception. `KeyError: None`. Possible reason might be that my graph instance is actually empty. Any ideas regarding this? – Debasish Kanhar Sep 06 '17 at 14:33
  • I'm not sure what that error means. I would expect you to get a zero if there were no vertices - not an error. If you do `g.V().count()` in the gremlin console via a `JanusGraph` instance does it show you have vertices? perhaps do an `addV()` from python and then a count to see what happens? – stephen mallette Sep 06 '17 at 14:38
  • 1
    So, I do `g.V().count()` from gremlin, and that works like a charm. I also did `addV()` and then tried printing it back, though I didnt commit, and the result stayed the same!! – Debasish Kanhar Sep 06 '17 at 15:01
  • I'm not sure what else to try. If I were you, I would probably simplify. Setup just gremlin server without janusgraph (use TinkerGraph) and get gremlin-python connected to that. Make sure you can add vertices and get counts. If it works with tinkergraph and doesn't in janusgraph then that helps isolate the problem. – stephen mallette Sep 06 '17 at 15:39
  • The `KeyError: None` error is most probably caused by a TinkerPop version mismatch between JanusGraph and gremlin-python – Benoit Guigal Jul 20 '18 at 08:23