0

I currently have a VM running Titan over a local Cassandra backend and would like the ability to use ElasticSearch to index strings using CONTAINS matches and regular expressions. Here's what I have so far:

  • After titan.sh is run, a Groovy script is used to load in the data from separate vertex and edge files. The first stage of this script loads the graph from Titan and sets up the ES properties:

    config.setProperty("storage.backend","cassandra") config.setProperty("storage.hostname","127.0.0.1")

    config.setProperty("storage.index.elastic.backend","elasticsearch") config.setProperty("storage.index.elastic.directory","db/es") config.setProperty("storage.index.elastic.client-only","false") config.setProperty("storage.index.elastic.local-mode","true")

  • The second part of the script sets up the indexed types:

    g.makeKey("property").dataType(String.class).indexed("elastic",Edge.class).make();

  • The third part loads in the data from the CSV files, this has been tested and works fine.

My problem is, I don't seem to be able to use the ElasticSearch functions when I do a Gremlin query. For example:

g.E.has("property",CONTAINS,"test")

returns 0 results, even though I know this field contains the string "test" for that property at least once. Weirder still, when I change CONTAINS to something that isn't recognised by ElasticSearch I get a "no such property" error. I can also perform exact string matches and any numerical comparisons including greater or less than, however I expect the default indexing method is being used over ElasticSearch in these instances.

Due to the lack of errors when I try to run a more advanced ES query, I am at a loss on what is causing the problem here. Is there anything I may have missed?

Thanks, Adam

adaml288
  • 61
  • 1
  • 6

1 Answers1

0

I'm not quite sure what's going wrong in your code. From your description everything looks fine. Can you try the follwing script (just paste it into your Gremlin REPL):

config = new BaseConfiguration()
config.setProperty("storage.backend","inmemory")
config.setProperty("storage.index.elastic.backend","elasticsearch")
config.setProperty("storage.index.elastic.directory","/tmp/es-so")
config.setProperty("storage.index.elastic.client-only","false")
config.setProperty("storage.index.elastic.local-mode","true")

g = TitanFactory.open(config)
g.makeKey("name").dataType(String.class).make()
g.makeKey("property").dataType(String.class).indexed("elastic",Edge.class).make()
g.makeLabel("knows").make()
g.commit()

alice = g.addVertex(["name":"alice"])
bob = g.addVertex(["name":"bob"])
alice.addEdge("knows", bob, ["property":"foo test bar"])

g.commit()

// test queries
g.E.has("property",CONTAINS,"test")
g.query().has("property",CONTAINS,"test").edges()

The last 2 lines should return something like e[1t-4-1w][4-knows-8]. If that works and you still can't figure out what's wrong in your code, it would be good if you can share your full code (e.g. in Github or in a Gist).

Cheers, Daniel

Daniel Kuppitz
  • 10,846
  • 1
  • 25
  • 34
  • Hi Daniel, thanks for the reply! I tried out this code and it works fine - the only difference between mine being the extra commit stages which I must have forgotten to add. However, if I run the same query with "tes" instead of "test", I get 0 results. – adaml288 Mar 04 '14 at 10:11
  • Technically the property contains both strings, and the same behaviour is apparent in my code. If I add a test connection with "foo test bar" to my graph, I can only use the contains filter on whole words rather than parts of the string. Is this expected? It's possible that my code worked all along but because I was trying to search for strings within words, it was returning nothing. – adaml288 Mar 04 '14 at 10:12
  • Yes, it is expected that it only works for whole words/tokens. You can use indexQuery for partial words, e.g. g.indexQuery("elastic", "e.property:tes*").edges().collect({ it.getElement() }). – Daniel Kuppitz Mar 04 '14 at 15:29