0

Every job item on the Stackoverflow RSS feed has certain tags, with the key "category".

Looking basically like this:

<category>scala</category>
<category>hadoop</category>
<category>apache-spark</category>
<category>hive</category>
<category>json</category>

I would like to use Feedparser, to put all tags into a list. Instead I always get just the first element. The Feedparser documentation mentioned entries[i].content, but I am unsure if that's the right approach, or how to use it in this case.

Here is my code:

import feedparser

rss_url = "https://stackoverflow.com/jobs/feed"
feed = feedparser.parse(rss_url)
items = feed["items"]

for item in items:
    title = item["title"]
    try:
        tags = []
        tags.append(item["category"])
        print(title + " " + str(tags))
    except:
        print("Failed")
Felix
  • 667
  • 14
  • 28

1 Answers1

2

category on feedparser items is basically an alias for the first element in the tags list, which is basically a list of more feedparser items, each with a term attribute that contains the tag name.

You can just access the terms directly:

categories = [t.term for t in item.get('tags', [])]

For your code that is:

for item in items:
    title = item["title"]
    categories = [t.term for t in item.get('tags', [])]
    print(title, ', '.join(categories))

See the entries[i].tags documentation.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343