Comparing sub items of lists and making changes in Python

Question

I have two lists originating from a part of speech tagger which look as follows:

pos_tags = [('This', u'DT'), ('is', u'VBZ'), ('a', u'DT'), ('test', u'NN'), ('sentence', u'NN'), ('.', u'.'), ('My', u"''"), ('name', u'NN'), ('is', u'VBZ'), ('John', u'NNP'), ('Murphy', u'NNP'), ('and', u'CC'), ('I', u'PRP'), ('live', u'VBP'), ('happily', u'RB'), ('on', u'IN'), ('Planet', u'JJ'), ('Earth', u'JJ'), ('!', u'.')]


pos_names = [('John', 'NNP'), ('Murphy', 'NNP')]

I want to create a final list which updates pos_tags with the list items in pos_names. So basically I need to find John and Murphy in pos_tags and replace the POS tag with NNP.

To what does `[('Planet', u'JJ'), ('Earth', u'JJ')]` belong? — Joschua, Dec 17 '14 at 14:31
That was a copy and paste error which has now been rectified in the original post. — Markus, Dec 17 '14 at 14:33
John and Murphy are already associated with NNP in your `pos_tags` list. Can you provide another example? Do you want to change the pos tag if a new one is seen? — xnx, Dec 17 '14 at 14:34
I have tried some nested loops which didn't work. I am more a linguist than a programmer so this is all a bit overwhelming. — Markus, Dec 17 '14 at 14:35
This is just a coincidence. To provide more background, the first lists originates from a classifier based POS tagger which often fails to identify names. The second list is generated by a tagger that aims at tagging names as NNP. So if I replace John with Markus then the list will show ('Markus',u'RB') which I would like to replace by ('Markus',u'NNP') if it is present in the pos_names list. — Markus, Dec 17 '14 at 14:39

score 0 · Answer 1 · answered Dec 17 '14 at 14:35

You could create a dictionary from pos_names that behaves as a lookup table. Then you can use get to search the table for possible replacements, and leave the tag as-is if no replacement is found.

d = dict(pos_names)
pos_tags = [(word, d.get(word, tag)) for word, tag in pos_tags]

score 0 · Answer 2 · answered Dec 17 '14 at 14:40

Given

pos_tags = [('This', u'DT'), ('is', u'VBZ'), ('a', u'DT'), ('test', u'NN'), ('sentence', u'NN'), ('.', u'.'), ('My', u"''"), ('name', u'NN'), ('is', u'VBZ'), ('John', u'NNP'), ('Murphy', u'NNP'), ('and', u'CC'), ('I', u'PRP'), ('live', u'VBP'), ('happily', u'RB'), ('on', u'IN'), ('Planet', u'JJ'), ('Earth', u'JJ'), ('!', u'.')]

and

names = ['John', 'Murphy']

you can do:

[next((subl for subl in pos_tags if name in subl)) for name in names]

which will give you:

[('John', u'NNP'), ('Murphy', u'NNP')]

Cheers. But my list looks like this: names = [('John', 'NNP'), ('Murphy', 'NNP')] — Markus, Dec 17 '14 at 14:54

score 0 · Accepted Answer · answered Dec 17 '14 at 14:50

I agree a dictionary would be a more natural solution to this problem, but if you need your pos_tags in order a more explicit solution would be:

for word, pos in pos_names:
    for i, (tagged_word, tagged_pos) in enumerate(pos_tags):
        if word == tagged_word:
            pos_tags[i] = (word,pos)

(A dictionary would probaby be faster for a large number of words, so you might want to consider storing the word order in a list and doing your POS allocation using a dictionary).

perfect. I am not too worried about performance so this works well for me. — Markus, Dec 17 '14 at 15:07

Comparing sub items of lists and making changes in Python

3 Answers3