Questions tagged [edgar]

EDGAR is an information system of the U.S. Securities and Exchange Commission holding company data. Questions related to parsing and querying the data and public APIs should be tagged.

EDGAR stays for Electronic Data Gathering, Analysis, and Retrieval. This information system uses several data formats: classic SGML based, XML-based XBRL format for business reporting and many more.

120 questions
1
vote
1 answer

word count from web text document result in 0

I tried the python codes from the article of Rasha Ashraf "Scraping EDGAR with Python". He used urllib2 which is now invalid in python 3, I guess. Thus, I changed it into urllib. I could bring the following Edgar web page. However, the number of…
1
vote
1 answer

How to parse 10-Q reports from EDGAR API in python?

I'm trying to use EDGAR API to retrieve 10-Q for any given company (corresponding to the CIK value provided.) This code retrieves the most recent 10-Q for Tesla. There are about 30 methods attached to this object, such as keys, values, items, and…
jbuddy_13
  • 902
  • 2
  • 12
  • 34
1
vote
0 answers

Parsing unstructured txt files and extracting tables

I would like to parse old style EDGAR txt files from SEC containing different filings with free financial data, but it's very non trivial to parse a txt with a semblance of a table and extract this data. Here is the link to the example file I…
kuatroka
  • 562
  • 7
  • 18
1
vote
0 answers

Cleaning SEC filings

I am currently trying to clean 10-K filings (2690 to be exact) in order to get the pure text (without html-tags etc.). Among others, I would like to calculate the readability scores in a next step. However, cleaning the text is becoming a larger…
Sebastian
  • 13
  • 2
1
vote
0 answers

SEC EDGAR 20-F Form - How to process text that contains html tags

I have the downloaded the following 20-F Form from SEC EDGAR: https://www.sec.gov/Archives/edgar/data/1729089/000121390019021541/0001213900-19-021541.txt As you can see, the .txt file contains multiple html tags such as:
adrCoder
  • 3,145
  • 4
  • 31
  • 56
1
vote
0 answers

Google Sheets: querying sec.gov for the latest filings for a given company

I've had a ton of help recently from the SO community and I'd first just like to say thank you to everyone! My latest Google Sheet pursuit is querying sec.gov for the latest filing for a given ticker. I'm not trying to scrape the site, I just want…
1
vote
0 answers

Count keywords in SEC Edgar 10-K filings text-body with Python

I am trying to parse the text section of the SEC Edgar texts in Python 3, e.g.: https://www.sec.gov/Archives/edgar/data/796343/0000796343-14-000004.txt My goal is to collect the number of occurrences in the visible text body of the 10-K statements…
dernuco
  • 15
  • 3
1
vote
2 answers

REGEX extract information from EDGAR SC-13 form

I am trying to extract information from the latest SEC EDGAR Schedule 13 forms filings. Link of the filing as an example: 1) Saba Capital_27-Dec-2019_SC13 The information I am trying to extract (and the parts of the filing with the information)…
Lko
  • 252
  • 4
  • 12
1
vote
1 answer

Repairing broken html table extracted with BS4 in python

I am parsing html tables from administrative filings. It is tricky as the html is often broken and this results in poorly constructed tables. Here is an example of table that I load into a pandas dataframe: 0 1 2 3 4 …
user1029296
  • 609
  • 8
  • 17
1
vote
1 answer

Cycle names thru list

I have 3 simple lines of code which pull S-1 filings from the SEC's "Edgar" database and put them into a folder I specify. This uses the "sec Edgar downloader." It works great, but I have to do this for about 1400 companies. I have the list of…
1
vote
2 answers

Regex capture lines A, B, or C in any order only when not preceded by D

I have a file with content something like this: SUBJECT COMPANY: COMPANY DATA: COMPANY CONFORMED NAME: MISCELLANEOUS SUBJECT CORP CENTRAL INDEX KEY: 0000000000 STANDARD INDUSTRIAL CLASSIFICATION: …
Matthew
  • 59
  • 6
1
vote
1 answer

Retrieve EBIT from XBRL documents

It appears that EBIT information is not very uniform across different XBRL documents. Cross comparing data with other sources, such as Yahoo, I have seen some XBRL use the fact us-gaap:OperatingIncomeLoss to store it if using US-GAAP, or…
1
vote
1 answer

Use Arelle to export XLSX file

I'm trying to use Arelle to export a XLSX file from a zip of XBRL files. It works just fine when I use the EdgarRenderer plugin. ./arelleCmdLine -f data/goog-20151231.xml.zip --plugins EdgarRenderer --disclosureSystem efm-pragmatic --validate -r…
AppTest
  • 491
  • 1
  • 7
  • 23
1
vote
1 answer

How to scrape individual paragraphs from SEC 10-Ks

I am working on a project where I need to break up 10-Ks into their constituent paragraphs. For some 10-Ks I am able to do something simple like soup.find_all('p'), but I am also seeing other 10-Ks that use
for everything instead of

tags.…

Leo
  • 11
  • 2
1
vote
1 answer

Querying Securities Exchange Comission (SEC) using EDGAR

I'm working on a project that allows the user to pull out information from both SEC and on the company's traded Stock using the company's stock-ticker. Now, in order for me to be able to retrieve information from the SEC using the stock ticker ONLY,…
Crashtor
  • 1,249
  • 1
  • 13
  • 21