1

I'm working on a basic python script that parses RSS Feed data from the SEC.gov website, but it fails when I run the script. Where am I going wrong?

The version of Python I'm using is 3.6.5, and I've tried using the libraries Atoma and feedparser, but I'm unable to pull any SEC RSS data successfully. To be honest it could be that the format of the rss feed data is not in a valid format(I checked https://validator.w3.org/feed/ and it shows that the data is invalid). But when I try the same line in a Google Chrome RSS feed extension it works, so I must be doing something wrong. Does anyone know how to fix the issue with the format or am I going about it in the wrong way in Python?

import atoma, requests

feed_name = "SEC FEED"
url ='https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001616707&type=&dateb=&owner=exclude&start=0&count=100&output=atom'
response = requests.get(url)
feed = atoma.parse_rss_bytes(response.content)

for post in feed.items:
  date = post.pub_date.strftime('(%Y/%m/%d)')
  print("post date: " + date)
  print("post title: " + post.title)
  print("post link: " + post.link)
RonRon
  • 11
  • 1
  • The error I get when I run the code above: Traceback (most recent call last): File "/Users/Giggs/PycharmProjects/untitled/rss.py", line 7, in feed = atoma.parse_rss_bytes(response.content) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/atoma/rss.py", line 221, in parse_rss_bytes return _parse_rss(root) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/atoma/rss.py", line 169, in _parse_rss .format(rss_version)) atoma.exceptions.FeedParseError: Cannot process RSS feed version "None" – RonRon Apr 19 '19 at 00:36

1 Answers1

1

Here is another way to solve the problem in Python:

import requests
import feedparser
import datetime

feed_name = "SEC FEED"
url ='https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001616707&type=&dateb=&owner=exclude&start=0&count=100&output=atom'
response = requests.get(url)
feed = feedparser.parse(response.content)

for entry in feed['entries']:
    dt = datetime.datetime.strptime(entry['filing-date'], '%Y-%m-%d')
    print('Date: ', dt.strftime('(%Y/%m/%d)'))
    print('Title: ', entry['title'])
    print(entry['link'])
    print('\n')

There was no pub_date field at the url, but you could use filing-date or choose a different date. You should get an output that looks like:

Date: (2021/03/11) Title: 8-K - Current report https://www.sec.gov/Archives/edgar/data/1616707/000161670721000075/0001616707-21-000075-index.htm

Date: (2021/02/25) Title: S-8 - Securities to be offered to employees in employee benefit plans https://www.sec.gov/Archives/edgar/data/1616707/000161670721000066/0001616707-21-000066-index.htm

Date: (2021/02/25) Title: 10-K - Annual report [Section 13 and 15(d), not S-K Item 405] https://www.sec.gov/Archives/edgar/data/1616707/000161670721000064/0001616707-21-000064-index.htm