Flair
uses BILUO
scheme, with empty line between sentences, so you would need to use bliuo_tags_from_offsets
:
import spacy
from spacy.gold import biluo_tags_from_offsets
nlp = spacy.load("en_core_web_md")
ents = [("George Washington went to Washington",{'entities': [(0, 6,'PER'),(7, 17,'PER'),(26, 36,'LOC')]}),
("Uber blew through $1 million a week", {'entities':[(0, 4, 'ORG')]}),
]
with open("flair_ner.txt","w") as f:
for sent,tags in ents:
doc = nlp(sent)
biluo = biluo_tags_from_offsets(doc,tags['entities'])
for word,tag in zip(doc, biluo):
f.write(f"{word} {tag}\n")
f.write("\n")
Output:
George U-PER
Washington U-PER
went O
to O
Washington U-LOC
Uber U-ORG
blew O
through O
$ O
1 O
million O
a O
week O
Note, to train just NER
this seem to be enough. If you wish to add pos tagging, you would need to create a mapping from Universal Pos Tags to Flair simplified scheme. For example:
tag_mapping = {'PROPN':'N','VERB':'V','ADP':'P','NOUN':'N'} # create your own
with open("flair_ner.txt","w") as f:
for pair in ents:
sent,tags = pair
doc = nlp(sent)
biluo = biluo_tags_from_offsets(doc,tags['entities'])
try:
for word,tag in zip(doc, biluo):
f.write(f"{word} {tag_mapping[word.pos_]} {tag}\n")
# f.write(f"{word} {tag_mapping.get(word.pos_,'None')} {tag}\n")
except KeyError:
print(f"''{word.pos_}' tag is not defined in tag_mapping")
f.write("\n")
Output:
''SYM' tag is not defined in tag_mapping'