I have labeled data like this:
Data = {'text': ['when can I decrease the contribution to my health savings?', 'I love my guinea pig', 'I love my dog'],
'start':[43, 10, 10],
'end':[57,19, 12],
'entity':['hsa', 'pet', 'pet'],
'value':['health savings', 'guinea pig', 'dog']
}
df = pd.DataFrame(Data)
text start end entity value
0 .. health savings 43 57 hsa health savings
1 I love my guinea pig 10 19 pet guinea pig
2 I love my dog 10 12 pet dog
Want to split sentences into words and tag each word. If the word is associated with an entity, tag it with that entity.
I have tried the way in this question: Split sentences in pandas into sentence number and words
But that method only works when the value is a single word like 'dog' but won't work if the value is a phrase like 'guinea pig'
Want to perform BIO tagging. B stands for beginning of a phrase. I stands for inside of a phrase. O stands for outside.
Thus the desired output will be:
Sentence # Word Entity
0 Sentence: 0 when O
1 Sentence: 0 can O
2 Sentence: 0 I O
3 Sentence: 0 decrease O
4 Sentence: 0 the O
5 Sentence: 0 contribution O
6 Sentence: 0 to O
7 Sentence: 0 my O
8 Sentence: 0 health B-hsa
9 Sentence: 0 savings? I-hsa
10 Sentence: 1 I O
11 Sentence: 1 love O
12 Sentence: 1 my O
13 Sentence: 1 guinea B-pet
14 Sentence: 1 pig I-pet
15 Sentence: 2 I O
16 Sentence: 2 love O
17 Sentence: 2 my O
18 Sentence: 2 dog B-pet