I implemented allennlp's OIE, which extracts subject, predicate, object information (in the form of ARG0, V, ARG1 etc) embedded in nested strings. However, I need to make sure that each output is linked to the given ID
of the original sentence.
I produced the following pandas dataframe, where OIE output
contains the raw output of the allennlp algorithm.
Current output:
sentence | ID | OIE output |
---|---|---|
'The girl went to the cinema' | 'abcd' | {'verbs':[{'verb': 'went', 'description':'[ARG0: The girl] [V: went] [ARG1:to the cinema]'}]} |
'He is right and he is an engineer' | 'efgh' | {'verbs':[{'verb': 'is', 'description':'[ARG0: He] [V: is] [ARG1:right]'}, {'verb': 'is', 'description':'[ARG0: He] [V: is] [ARG1:an engineer]'}]} |
My code to get the above table:
oie_l = []
for sent in sentences:
oie_pred = predictor_oie.predict(sentence=sent) #allennlp oie predictor
for d in oie_pred['verbs']: #get to the nested info
d.pop('tags') #remove unnecessary info
oie_l.append(oie_pred)
df['OIE out'] = oie_l #add new column to df
Desired output:
sentence | ID | OIE Triples |
---|---|---|
'The girl went to the cinema' | 'abcd' | '[ARG0: The girl] [V: went] [ARG1:to the cinema]' |
'He is right and he is an engineer' | 'efgh' | '[ARG0: He] [V: is] [ARG1:right]' |
'He is right and he is an engineer' | 'efgh' | '[ARG0: He] [V: is] [ARG1:an engineer]' |
Approach idea:
To get to the desired output of 'OIE Triples' , I was considering transforming the initial 'OIE output' into a string and then using regular expression to extract the ARGs. However, I am not sure if this is the best solution, as the 'ARGs' can vary. Another approach, would be to iterate to the nested values of description:
, replace what is currently in the OIE output in the form of a list and then implement df.explode()
method to expand it, so that the right sentence and id columns are linked to the triple after 'exploding'.
Any advice is appreciated.