I am new to NER and NLP in general and I want to know if I understood the material right. So for example I have pre-trained model "ner-english-large". I am using a model and it turns out that the model is not recognizing the right entities.
(In this example 'Dsl' wasn't marked as ORG)
from flair.data import Sentence
from flair.models import SequenceTagger
from flair.data import Corpus
from flair.trainers import ModelTrainer
tagger = SequenceTagger.load("flair/ner-english-large")
sentence = Sentence("Dsl hit 100% success in sales across all the world")
tagger.predict(sentence)
sentence.get_spans('ner')
#output
[]
So I want to improve my model. I am uploading this sentence to the corpus in the appropriate format.
columns = {0 : 'text', 1 : 'ner'}
data_folder = "train"
corpus: Corpus = ColumnCorpus(data_folder, columns)
label_type = 'ner'
label_dict = corpus.make_label_dictionary(label_type=label_type)
Then I am initializing the trainer
tagger = SequenceTagger.load("flair/ner-english-large")
trainer = ModelTrainer(tagger, corpus)
But after this I am a little bit confused about what I should do next.
Before that, I tried train
method and it worked. Data (that was previously recognized incorrectly) was recognized as needed.
trainer.train('fine/taggers/continued_model',
learning_rate=0.01,
mini_batch_size=32,
max_epochs=6)
But I don't know if this is the correct method or I need to use fine-tune
? Could you explain which approach is more correct?