-1

I have an xml file through which I have to retrieve xml document. Below is the xml document i have.

-<orcid-message>
   -<orcid-profile type="user">
      -<orcid-activities>
         -<orcid-works>
            -<orcid-work put-code="23938140" visibility="public">
               -<work-contributors>
                  -<contributor>
                       -<credit-name visibility="public">Tania Maes</credit-name>
                  -<contributor>
                       -<credit-name visibility="public">Francisco Avila Cobos</credit-name>
                  -<contributor>
                       -<credit-name visibility="public">Franco Liala Manus</credit-name>

I want to retrieve the contributor name: I have tried so far:

contributors_name = (doc['orcid-message']['orcid-profile']
                        ['orcid-activities']['orcid-works']
                        ['orcid-work']['work-contributors']
                        ['contributor']['credit-name']  )

print(contributors_name)

Please tell me where I am going wrong. Thank you.

har07
  • 88,338
  • 12
  • 84
  • 137
user3419487
  • 185
  • 1
  • 9
  • What is `doc` variable? How did you populate it? – har07 Jul 17 '16 at 11:37
  • doc = xmltodict.parse(fd.read()) and fd is the xml document – user3419487 Jul 17 '16 at 11:52
  • And what's the problem with your current code? Nothing get printed or any exception thrown? – har07 Jul 17 '16 at 12:17
  • 1
    contributors_name = (doc['orcid-message']['orcid-profile']['orcid-activities']['orcid-works']['orcid-work']['work-contributors']['contributor']['credit-name'] ) TypeError: list indices must be integers, not str : I get this eeror. – user3419487 Jul 17 '16 at 12:26

1 Answers1

0

"TypeError: list indices must be integers, not str : I get this error"

The error message suggests that the problem was due to the XML containing multiple contributor elements, hence your code up to ['contributor'] part will return a list, which in turn can't be accessed directly by key (i.e ['credit-name']) like a dictionary. You need to pick one item in the list from which you want to get credit-name, for example from the first item :

contributors = doc['orcid-message']['orcid-profile'] \
    ['orcid-activities']['orcid-works'] \
    ['orcid-work']['work-contributors'] \
    ['contributor']
contributor_name = contributors[0]['credit-name']

Or you can use list comprehension to get credit-name from all contributors :

contributors_name = [contrib['credit-name']['#text'] for contrib in contributors]
print(contributors_name)

output :

[u'Tania Maes', u'Francisco Avila Cobos', u'Franco Liala Manus']
har07
  • 88,338
  • 12
  • 84
  • 137
  • I still get the same error. When i assign: contributors = doc['orcid-message']['orcid-profile'] ['orcid-activities']['orcid-works'] ['orcid-work']['work-contributors'] ['contributor']. It gives the same error. – user3419487 Jul 18 '16 at 08:20
  • Does the actual XML contains multiple elements of the same name, other than `contributor`, that mentioned in the code in your comment? – har07 Jul 18 '16 at 08:49
  • Yes, it does contain. May be i should use E-tree lib. which lib did u use to get the output? – user3419487 Jul 18 '16 at 09:02
  • `xmltodict` since the question is using this library. But in my test, only `contributor` and `credit-name` are multiple (based on XML snippet posted in question), so it worked fine. – har07 Jul 18 '16 at 09:05