I'm learning how to use DeepPavlov and can't figure how to use it for NER using the Camembert (french) pre-trained model. My goal is to tag a short paragraph in french.
The docs from deeppavlov explicitly list the Camembert model as a viable transformer architecture. I tried to follow as best as I could but I keep getting this error when I try to build the model.
>>> ner_model = build_model('ner_ontonotes_bert_mult')
/home/philippe/.local/lib/python3.10/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
Some weights of the model checkpoint at camembert-base were not used when initializing CamembertForTokenClassification: ['lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.dense.bias']
- This IS expected if you are initializing CamembertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CamembertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of CamembertForTokenClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
2023-01-29 09:58:13.308 ERROR in 'deeppavlov.core.common.params'['params'] at line 108: Exception in <class 'deeppavlov.models.torch_bert.torch_transformers_sequence_tagger.TorchTransformersSequenceTagger'>
Traceback (most recent call last):
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/core/common/params.py", line 102, in from_params
component = obj(**dict(config_params, **kwargs))
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py", line 182, in __init__
super().__init__(optimizer=optimizer,
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/core/models/torch_model.py", line 98, in __init__
self.load()
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py", line 295, in load
self.crf = CRF(self.n_classes).to(self.device)
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/models/torch_bert/crf.py", line 13, in __init__
super().__init__(num_tags=num_tags, batch_first=batch_first)
File "/home/philippe/.local/lib/python3.10/site-packages/torchcrf/__init__.py", line 40, in __init__
raise ValueError(f'invalid number of tags: {num_tags}')
ValueError: invalid number of tags: 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/core/commands/infer.py", line 55, in build_model
component = from_params(component_config, mode=mode)
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/core/common/params.py", line 102, in from_params
component = obj(**dict(config_params, **kwargs))
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py", line 182, in __init__
super().__init__(optimizer=optimizer,
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/core/models/torch_model.py", line 98, in __init__
self.load()
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py", line 295, in load
self.crf = CRF(self.n_classes).to(self.device)
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/models/torch_bert/crf.py", line 13, in __init__
super().__init__(num_tags=num_tags, batch_first=batch_first)
File "/home/philippe/.local/lib/python3.10/site-packages/torchcrf/__init__.py", line 40, in __init__
raise ValueError(f'invalid number of tags: {num_tags}')
ValueError: invalid number of tags: 0
I downloaded the camembert-base model from https://huggingface.co/camembert-base and copied the files in .deeppavlov/models/camembert-base directory.
Then I figured the deeppavlov's model 'ner_ontonotes_bert_mult' was the best for my use so I edited the config file and changed thoses lines in the metadata section at the end. The docs from DeepPavlov ask to change the TRANSFORMER value, witch I did, and I changed the MODEL_PATH so it point to the files I downloaded previously.
"variables": {
"ROOT_PATH": "~/.deeppavlov",
"DOWNLOADS_PATH": "{ROOT_PATH}/downloads",
"MODELS_PATH": "{ROOT_PATH}/models",
"TRANSFORMER": "camembert-base",
"MODEL_PATH": "{MODELS_PATH}/camembert-base"
},
I am aware that I should have copied the config file to a new one with a different name but this should not be a problem.
Then in python I did the following :
from deeppavlov import configs, build_model
build_model('ner_ontonotes_bert_mult')`
And then I get the error mentioned before. I am lost and don't know where to look from now.