0

I'm brand new to programming (though I'm willing to learn), so apologies in advance for my very basic question.

The [SEC makes available all of their filings via FTP][1], and eventually, I would like to download a subset of these files in bulk. However, before creating such a script, I need to generate a list for the location of these files, which follow this format:

/edgar/data/51143/000005114313000007/0000051143-13-000007-index.htm

  • 51143 = the company ID, and I already accessed the list of company IDs I need via FTP
  • 000005114313000007/0000051143-13-000007 = the report ID, aka "accession number"

I'm struggling with how to figure this out as the documentation is fairly light. If I already have the 000005114313000007/0000051143-13-000007 (what the SEC calls the "accession number") then it's pretty straightforward. But I'm looking for ~45k entries and would obviously need to generate these automatically for a given CIK ID (which I already have).

Is there an automated way to achieve this?

zx485
  • 28,498
  • 28
  • 50
  • 59
user53526356
  • 934
  • 1
  • 11
  • 25

1 Answers1

0

Welcome to SO.

I'm currently scraping the same site, so I'll explain what I've done so far. What I am assuming is that you'll have the CIK numbers of the companies you're looking to scrape. If you search the company's CIK, you'll get a list of all of the files that are available for the company in question. Let's use Apple as an example (since they have a TON of files):

Link to Apple's Filings

From here you can set a search filter. The document you linked was a 10-Q, so let's use that. If you filter 10-Q, you'll have a list of all of the 10-Q documents. You'll notice that the URL changes slightly, to accommodate for the filter.

You can use Python and its web scraping libraries to take that URL and scrape all of the URLs of the documents in the table on that page. For each of these links you can scrape whatever links or information you want off the page. I personally use BeautifulSoup4, but lxml is another choice for web scraping, should you choose Python as your programming language. I would recommend using Python, as it's fairly easy to learn the basics and some intermediate programming constructs.

Past that, the project is yours. Good luck, I've posted some links below to get you started. I'm only allowed to post two links since I'm new to the site, so I'll give you the beautiful soup link:

Beautiful Soup Home Page

If you choose to use Python and are new to the language, check out the codecademy python course, and don't forget to check out lxml, as some people prefer it over BeautifulSoup (some people also use both in conjunction, so it's all a matter of personal preference).

Retroflux
  • 57
  • 1
  • 1
  • 9