0

As the title says, i'm trying to count the occurrence of a name in a list of namedtuples, with the name i'm looking for in a nested tuple. It is an assignment for school, and a big part of the code is given. The structure of the list is as follows:

paper = namedtuple( 'paper', ['title', 'authors', 'year', 'doi'] )

for (id, paper_info) in Summaries.iteritems():
    Summaries[id] = paper( *paper_info )

It was easy to get the number of unique titles for each year, since both 'title' and 'year' contain one value, but i can't figure out how to count the number of unique authors per year.

I don't expect you guys to give me the entire code or something, but if you could give me a link to a good tutorial about this subject this would help a lot. I did google around a lot, but i cant find any helpful information!

I hope i'm not asking too much, first time i ask a question here.

EDIT: Thanks for the responses so far. This is the code i have now:

authors = [
    auth
    for paper in Summaries.itervalues()
    for auth in paper.authors
    ]

authors

The problem is, i only get a list of all the authors with this code. I want them linked to the year tough, so i can check the amount of unique authors for each year.

Rob
  • 13
  • 1
  • 5

2 Answers2

0

For keeping track of unique objects, I like using set. A set behaves like a mathematical set in that it can have at most one copy of any given thing in it.

from collections import namedtuple

# by convention, instances of `namedtuple` should be in UpperCamelCase
Paper = namedtuple('paper', ['title', 'authors', 'year', 'doi'])

papers = [
    Paper('On Unicorns', ['J. Atwood', 'J. Spolsky'], 2008, 'foo'),
    Paper('Discourse', ['J. Atwood', 'R. Ward', 'S. Saffron'], 2012, 'bar'),
    Paper('Joel On Software', ['J. Spolsky'], 2000, 'baz')
    ]

authors = set()
for paper in papers:
    authors.update(paper.authors) # "authors = union(authors, paper.authors)"

print(authors)
print(len(authors))

Output:

{'J. Spolsky', 'R. Ward', 'J. Atwood', 'S. Saffron'}
4

More compactly (but also perhaps less readably), you could construct the authors set by doing:

authors = set([author for paper in papers for author in paper.authors])

This may be faster if you have a large volume of data (I haven't checked), since it requires fewer update operations on the set.

senshin
  • 10,022
  • 7
  • 46
  • 59
0

If you don't want to use embeded type set() and want to understand the logic, use a list and if bifurcation.

If we don't use set() in senshin's code:

# authors = set()
# for paper in papers:
#     authors.update(paper.authors) # "authors = union(authors, paper.authors)"

authors = []
for paper in papers:
    for author in paper.authors:
        if not author in authors:
            authors.append(author)

You can get similar result as senshin's. I hope it helps.

gh640
  • 164
  • 2
  • 6