Parse Stackoverflow RSS job feed for same name elements, with Feedparser in Python

Question

Every job item on the Stackoverflow RSS feed has certain tags, with the key "category".

Looking basically like this:

<category>scala</category>
<category>hadoop</category>
<category>apache-spark</category>
<category>hive</category>
<category>json</category>

I would like to use Feedparser, to put all tags into a list. Instead I always get just the first element. The Feedparser documentation mentioned entries[i].content, but I am unsure if that's the right approach, or how to use it in this case.

Here is my code:

import feedparser

rss_url = "https://stackoverflow.com/jobs/feed"
feed = feedparser.parse(rss_url)
items = feed["items"]

for item in items:
    title = item["title"]
    try:
        tags = []
        tags.append(item["category"])
        print(title + " " + str(tags))
    except:
        print("Failed")

Martijn Pieters · Accepted Answer · 2017-10-28T14:32:17.973

category on feedparser items is basically an alias for the first element in the tags list, which is basically a list of more feedparser items, each with a term attribute that contains the tag name.

You can just access the terms directly:

categories = [t.term for t in item.get('tags', [])]

For your code that is:

for item in items:
    title = item["title"]
    categories = [t.term for t in item.get('tags', [])]
    print(title, ', '.join(categories))

See the entries[i].tags documentation.

Parse Stackoverflow RSS job feed for same name elements, with Feedparser in Python

1 Answers1