3

Even on an empty database, creating an index in Titan 1.0 takes several minutes. The time seems exact, which suggests there is an unnecessary delay.

My question is this: How to I shorten or eliminate the amount of time Titan takes to reindex? Conceptually, since no work is being done the time should be minimal, certainly not four minutes.

(N.B. I have previously been pointed to a solution that simply makes Titan wait the full delay without timing out. This is the wrong solution - I want to eliminate the delay entirely.)

The code I'm using to setup the database from scratch is:

graph = ... a local cassandra instance ...
graph.tx().rollback()

// 1. Check if the index already exists
mgmt = graph.openManagement()
i = mgmt.getGraphIndex('byIdent')
if(! i) {
  // 1a. If the index does not exist, add it
  idKey = mgmt.getPropertyKey('ident')
  idKey = idKey ? idKey : mgmt.makePropertyKey('ident').dataType(String.class).make()
  mgmt.buildIndex('byIdent', Vertex.class).addKey(idKey).buildCompositeIndex()
  mgmt.commit()
  graph.tx().commit()

  mgmt  = graph.openManagement()
  idKey = mgmt.getPropertyKey('ident')
  idx   = mgmt.getGraphIndex('byIdent')
  // 1b. Wait for index availability
  if ( idx.getIndexStatus(idKey).equals(SchemaStatus.INSTALLED) ) {
    mgmt.awaitGraphIndexStatus(graph, 'byIdent').status(SchemaStatus.REGISTERED).call()
  }
  // 1c. Now reindex, even though the DB is usually empty.
  mgmt.updateIndex(mgmt.getGraphIndex('byIdent'), SchemaAction.REINDEX).get()
  mgmt.commit()
  mgmt.awaitGraphIndexStatus(graph, 'byIdent').status(SchemaStatus.ENABLED).call()
} else { mgmt.commit() }

It appears to be the updateIndex...REINDEX call that blocks till timeout. Is this a known problem or worksformewon'tfix? Am I doing something wrong?

EDIT: Disabling the REINDEX, as discussed in comments is actually not a fix because the index does not seem to become active. I now see:

WARN  com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx  - Query requires iterating over all vertices [(myindexedkey = somevalue)]. For better performance, use indexes
Thomas M. DuBuisson
  • 64,245
  • 7
  • 109
  • 166
  • Possible duplicate of [Index state never change to ENABLED on Titan with Amazon DynamoDB backend](http://stackoverflow.com/questions/35088574/index-state-never-change-to-enabled-on-titan-with-amazon-dynamodb-backend) – Mohamed Taher Alrefaie Jun 03 '16 at 14:43
  • Eliminate the call to `REINDEX` if there is no existing data, like when you create the property key and index for the first time. – Jason Plurad Jun 03 '16 at 17:24
  • @JasonPlurad That is a good strategy for most my uses. What if the database is tiny at the time of index creation? Say, what if I have very few but non-zero verticies? Must I reindex and incur this seemingly meaningless delay (till I submit a pull request, at least)? – Thomas M. DuBuisson Jun 03 '16 at 17:33
  • Yeah, if you have data in there, you'll need to `REINDEX` in that case. Best practice is to define your schema and indexes up front and keep it locked down. – Jason Plurad Jun 03 '16 at 17:37

1 Answers1

3

The time delay is/was entirely unnecessary and due to my misuse of Titan (though the pattern does appear in Titan 1.0.0 documentation chapter 28).

Do not block in a transaction!

Instead of:

  mgmt  = graph.openManagement()
  idKey = mgmt.getPropertyKey('ident')
  idx   = mgmt.getGraphIndex('byIdent')
  // 1b. Wait for index availability
  if ( idx.getIndexStatus(idKey).equals(SchemaStatus.INSTALLED) ) {
    mgmt.awaitGraphIndexStatus(graph, 'byIdent').status(SchemaStatus.REGISTERED).call()
  }

Consider:

  mgmt  = graph.openManagement()
  idKey = mgmt.getPropertyKey('ident')
  idx   = mgmt.getGraphIndex('byIdent')
  // Wait for index availability
  if ( idx.getIndexStatus(idKey).equals(SchemaStatus.INSTALLED) ) {
    mgmt.commit()
    mgmt.awaitGraphIndexStatus(graph, 'byIdent').status(SchemaStatus.REGISTERED).call()
  } else { mgmt.commit() }

Use ENABLE_INDEX

Not: mgmt.updateIndex(mgmt.getGraphIndex('byIdent'), SchemaAction.REINDEX).get()

Rather: mgmt.updateIndex(mgmt.getGraphIndex('byIdent'),SchemaAction.ENABLE_INDEX).get()

Thomas M. DuBuisson
  • 64,245
  • 7
  • 109
  • 166