In a SparkNLP's PipelineModel
all the stages have to be of type AnnotatorModel
. But what if one of those annotatormodels requires a certain column in the dataset as input and this input column is the output of an AnnotatorApproach
?
For instance, I have a trained model for NER (as the last stage of the pipeline) which requires tokens and POS tags as two of the inputs. The tokens are also required by the POS tagger. But the Tokenizer is an AnnotatorApproach
and I am not able to add this to the pipeline.
This is how the Tokenizer is instantiated (in Java):
AnnotatorApproach<TokenizerModel> tokenizer = new Tokenizer();
This works:
Pipeline pipeline = new Pipeline().setStages( new PipelineStage[]{tokenizer} );
But this doesn't work, because Tokenizer is not a Transformer:
List<Transformer> list;
list.add(tokenizer);
PipelineModel pipelineModel = new PipelineModel("ID42", list);