0

So I am writing a Python script in order to obtain data from XML that I get in response to an API request which was sent using POST and the requests library.

Currently I am using my request like so and getting a response back like:

req = requests.post(url + '/endpoint', headers = headers, params = {'search': searchQuery}, verify = False)
print(req.text)

This results in req.text giving a response to me of my XML which is structured like so:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xml" href="/static/atom.xsl"?>
<feed>
    <!-- Feed elements>
    <entry>
        <!-- Other Elements -->
        <content type="text/xml">
            <s:dict>
                <!-- Other keys. -->
                <s:key name="sid">DATA I WANT HERE</s:key>
                <!-- Other keys. -->
            </s:dict>
            <!-- Lots of other dicts here. -->
        </content>
    </entry>
    <! -- Other entries -->
</feed>

My goal is to obtain all the data from the s:key with name of sid and print that out. There are hundreds of entries per feed and in each there is only one s:key with a sid in it (it's a service identifier I need to obtain).

My issue is I'm not sure how to extract it, cause right now I'm trying to use Element Tree like so, but it is not returning the results I want.

print(req.text)
results = ET.fromstring(req)
for job in results.findall('s:key'):
    print(job.get('name'))

I also tried:

for node in results.findall('s:key'):
    if node.attrib['name'] == "sid":
        print(node)

which also does not give me the info I want.

What am I doing wrong and how do I fix it? I'm somewhat unfamiliar with Python and very new to XML parsing so I would appreciate some insights into this problem.

Addendum:

To add, currently it seems to just print out all the XML lines with s:key and an attribute of name in them which I do not want.

For example a sample output at the moment is:

<s:key name="a">74993868</s:key>
<s:key name="b">0</s:key>
<s:key name="c">date</s:key>
<s:key name="d">6000</s:key>
<s:key name="e">600</s:key>
<s:key name="f">text</s:key>
<s:key name="sid">data I actually want</s:key>
<!-- Etc -->
Rietty
  • 1,116
  • 9
  • 24
  • Did you check this question to see if it helps you? https://stackoverflow.com/questions/18308529/python-requests-package-handling-xml-response – daniboy000 Feb 21 '19 at 14:46
  • 1
    @daniboy000 I double checked and also updated my question to better reflect the issue. – Rietty Feb 21 '19 at 15:01

1 Answers1

0

One possible way is to use Regex:

Using the regex you will find the groups, like this Regex.

>>> import re
>>> m = re.search(r'\<s\:\S+\sname=\"sid\"\>(.+)\<.+', string, re.MULTILINE)
>>> print(m.groups())
('DATA I WANT HERE',)
  • While this works, I only get the very first data set. i.e. It doesn't return any of the other matching items, even though when I dump to file and `CTRL+F` I can see at least 500 matching results. – Rietty Feb 21 '19 at 16:05