I want to attribute an ID to every document in a vespa cluster.
But I don't completely understand how visitors work in vespa.
Can I get a shared field (meaning shared by all instances of my visitor), which I can atomically increment (using some lock) every time I visit a document ?
What I tried obviously doesn't work, but you'll see the general idea :
public class MyVisitor extends DocumentProcessor {
// where should i put this ?
private int document_id;
private final Lock lock = new ReentrantLock();
@Override
public Progress process(Processing processing) {
Iterator<DocumentOperation> it = processing.getDocumentOperations().iterator();
while (it.hasNext()) {
DocumentOperation op = it.next();
if (op instanceof DocumentPut) {
Document doc = ((DocumentPut) op).getDocument();
/*
* Remove the PUT operation from the iterator so that it is not indexed back in
* the document cluster
*/
it.remove();
try {
try {
lock.lock();
document_id += 1;
} finally {
lock.unlock();
}
} catch (StatusRuntimeException | IllegalArgumentException e) {
}
}
}
return Progress.DONE;
}
}
Another idea it to get the number of buckets and the bucket id I'm currently dealing with and to increment using this pattern:
document_id = bucket_id
document_id += bucked_count
which would work (if I can ensure my visitor operates on a single bucket at a time) but I don't know how to get these information from my visitor.