SparkNLP's NerCrfApproach with custom labels

Question

I am trying to train a SparkNLP NerCrfApproach model with a dataset in CoNLL format that has custom labels for product entities (like I-Prod, B-Prod etc.). However, when using the trained model to make predictions, I get only "O" as the assigned label for all tokens. When using the same model trained on the CoNLL data from the SparkNLP workshop example, the classification works fine. (cf. https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/jupyter/training/english/crf-ner)

So, the question is: Does NerCrfApproach rely on the standard tag set for NER labels used by the CoNLL data? Or can I use it for any custom labels and, if yes, do I need to specify these somehow? My assumption was that the labels are inferred from the training data.

Cheers, Martin

Update: The issue might not be related to the labels after all. I tried to replace my custom labels with CoNLL standard labels and I am still not getting the expected classification results.

score 0 · Accepted Answer · answered Oct 14 '21 at 06:26

0

As it turns out, this issue was not caused by the labels, but rather by the size of the dataset. I was using a rather small dataset for development purposes. Not only was this dataset quite small, but also heavily imbalanced, with a lot more "O" labels than the other labels. Fixing this by using a dataset of 10x the original size (in terms of sentences), I am able to get meaningful results, even for my custom labels.

answered Oct 14 '21 at 06:26

martin_wun

1,599
1
15
33

Hi @martin_wun, I also want to train custom token sentence classifier. Could you please help me with the sample notebook/video tutorial? – Sajjad Manal Aug 09 '22 at 16:50

score -1 · Answer 2 · answered Nov 18 '22 at 05:02

-1

i wanted create custom labels with CoNLL standard labels for my project, need help from you in this regard as how to follow, any materials.

answered Nov 18 '22 at 05:02

Abhilash Na

1

I would suggest you create a separate question for your task and provide some more background. – martin_wun Nov 18 '22 at 13:39
Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Nov 23 '22 at 17:55

SparkNLP's NerCrfApproach with custom labels

2 Answers2