0

I need to identify all countries mentioned in a text file using the nametagger model. However, I found out that there are mistakes in the Output. For expample, it identify Cuba as 'O' instead of 'B-LOC'. Also, it cannot correctly identify words which are part of a country's name. For example, 'Kingdom' is not 'B-LOC' while I cannot find a way use the model with bigram tokens. In short, I wonder how I can find the correct country name in multiple characters like United Kingdom etc? Using methods other than the nametagger model is also ok!

Thanks!

Here is the code I tried:

model <- nametagger_download_model("english-conll-140408", model_dir = tempdir()) predict(model, rename(refugee_df_udp, text = lemma)) %>% filter(!entity %in% c("O", "B-PER")) %>% distinct(term, .keep_all = TRUE)

  • Images of data are not that helpful. Use `dput(data)` ,or `dput(head(data))` if the output is really big, to share data. And there is no need to share images of your code either. Put the code in the question. Surround the code with 3 of ` (the key on left side of 1 key) to make it look like code. – John Polo Nov 06 '22 at 22:06

0 Answers0