Parsing unstructured txt files and extracting tables

Question

I would like to parse old style EDGAR txt files from SEC containing different filings with free financial data, but it's very non trivial to parse a txt with a semblance of a table and extract this data.

Here is the link to the example file

I created a start of a program, but it's very flaky and needs a lot of tuning for different situations. Any other similar file but from year 2000 and not 1999 would fail if the length of data changes, the program will break. I'm not a programmer and I wonder if there is more robust and scalable way to parse this type of text files. Thanks

from bs4 import BeautifulSoup
import requests

fo_99 = requests.get("https://www.sec.gov/Archives/edgar/data/1067983/000095015099001240/0000950150-99-001240.txt")
soup_99 = BeautifulSoup(fo_99.text, "lxml")

tables_99 = soup_99.find_all('caption')
len(tables_99)
table = tables_99[1].find_all("s")
len(table)
_string = str(table[0]).split("\n")


for line in str(table[0]).split("\n"):

    if len(line) > 11:
        if not line.startswith(("<s>")):
            print(  line[0:25], "|", 
                            line[25:30], "|", 
                            line[30:43], "|" ,
                            line[43:55], "|" ,
                            line[55:66], "|",
                            line[66:72], "|",
                            line[72:76], "|",
                            line[76:87], "|",
                            line[87:109], "|",
                            line[109:121], "|",
                            line[121:128], "|",
                            line[128:], "|")
    else:
        print(line)

To this project, apply the proverb "Necessity is the Mother of Invention", you must. SO isn't a coding service. If you have a specific question about a specific problem or difficulty, that's what SO is for. This is where you put in the time and research to solve your problem. See [ask]. — Trenton McKinney, Jun 24 '20 at 23:18
Perhaps this could help [Edgar-COMPANY-FILINGS-Web-Scrapping-Data-Analysis](https://github.com/ragraw26/Edgar-COMPANY-FILINGS-Web-Scrapping-Data-Analysis) — DarrylG, Jun 25 '20 at 08:28
Not helpful Trenton. I have a specific question - how this code can be improved? I’m not a coder, but I tried my best to come up with something that works. I showed it precisely so not to be seen as a lazy “solve this problem for me while I’m sipping a cuppa”. Thanks Darryl, I’ll check. — kuatroka, Jun 25 '20 at 23:51

Parsing unstructured txt files and extracting tables

0 Answers0