Currently I am working on productionize a NER model on Spark. I have a current implementation that is using Huggingface DISTILBERT with the TokenClassification head, but as the performance is a bit slow and costly, I am trying to find ways to optimize.
I have checked SPARKNLP implementation, which lacks a pretrained DISTILBERT and has I think a different approach, so some questions regarding this arose:
- Huggingface uses the entire BERT model and adds a head for token classification. Is this the same as obtaining the BERT embeddings and just feeding them to another NN?
- I ask this because this is the SPARKNLP approach, a class that helps obtaim those embeddings and use it as a feature for another complex NN. Doesnt this lose some of the knowledge inside BERT?
- Does SPARKNLP have any optimization regarding SPARK that helps in inference time or is it just another BERT implementation.