I have a list of unicode string lists.
Each string list represents a different document with the strings representing the authors' names. Some documents have only one author while other documents can have multiple co-authors.
For example, a sample of authorship of three documents looks like this:
authors = [[u'Smith, J.', u'Williams, K.', u'Daniels, W.'], [u'Smith, J.'], [u'Williams, K.', u'Daniels, W.']]
I want to convert my list into a dictionary and list.
First, a dictionary that provides an integer key for each name:
author_name = {0: u'Smith, J.', 1: u'Williams, K.', 2: u'Daniels, W.'}
Second, a list that identifies the authors for each document by the integer key:
doc_author = [[0, 1, 2], [0], [1, 2]]
What is the most efficient way to create these?
FYI: I need my author data in this format to run a pre-built author-topic LDA algorithm written in Python.