Say instead of documents I have small trees that I need to store in a Lucene index. How do I go about doing that?
An example node in the tree:
class Node
{
String data;
String type;
List<Node> children;
}
In the above node the "data" member variable is a space separated string of words, so that needs to be full-text searchable. The "type" member variable is just a single word.
The search query will be a tree itself and will search both the data and type in each node and also the structure of the tree for a match. Before matching against a child node, the query must first match the parent node data and type. Approximate matching on the data value is acceptable.
What's the best way to index this kind of data? If Lucene does not directly support indexing these data then can this be done by Solr or Elasticsearch?
I took a quick look at neo4j, but it seems to store an entire graph in the db, not a large collection (say billions or trillions) of small tree structures. Or my understanding was wrong?
Also, is a non-Lucene based NoSQL solution is better suited for this?