1

I have the below code to scrape some specific word list from the financial statements (US SEC EDGAR 10K) text file. Will highly appreciate if you anyone can help me with this. I have manually cross-checked and found the words in the document, but my code is not finding any word at all. I am using Python 3.5.3. Thanks in advance

Given a URL path for EDGAR 10-K file in .txt format for a company (CIK) in a year this code will perform a word count

#!/usr/bin/python
# -*- coding: utf-8 -*-
import urllib.request as urllib2
import time
import csv
import sys

CIK = '0001018724'
Year = '2013'
string_match1 = 'edgar/data/1018724/0001193125-13-028520.txt'
url3 = 'https://www.sec.gov/Archives/' + string_match1
response3 = urllib2.urlopen(url3)
words = [
    'anticipate',
    'believe',
    'depend',
    'fluctuate',
    'indefinite',
    'likelihood',
    'possible',
    'predict',
    'risk',
    'uncertain',
    ]
count = {}  # is a dictionary data structure in Python
for elem in words:
    count[elem] = 0
for line in response3:
    elements = line.split()
    for word in words:
     count[word] = count[word] + elements.count(word)
print CIK
print Year
print url3
print count

Here is the script output:

0001018724

2013

https://www.sec.gov/Archives/edgar/data/1018724/0001193125-13-028520.txt

{
    'believe': 0,
    'likelihood': 0,
    'anticipate': 0,
    'fluctuate': 0,
    'predict': 0,
    'risk': 0,
    'possible': 0,
    'indefinite': 0,
    'depend': 0,
    'uncertain': 0,
}
  • 1
    It would be helpful if you posted the current output of your code, so we can see what the problem is. Also I don't see in your code where you are actually looking for the words in your word list. Try something that goes through each line and then check the words inside in your line split inside of that (Use a nested loop for starters). Also to help you figure out what is going on, put lot's of prints or use a debugger to see what is happening at each part of your code. – Michael Robellard Jul 18 '19 at 23:20
  • Updated with the results. Thanks again for helping! – bd_math_genius Jul 18 '19 at 23:26
  • Your code still doesn't show any actual counting of anything. – Jack Fleeting Jul 19 '19 at 01:03
  • Try taking your third loop and moving inside your second loop. When you run the third loop it will only have the contents of the last line of the file the way you have it written. What I think you are trying to accomplish is to count on each line, which would require the third loop inside the second loop. – Michael Robellard Jul 19 '19 at 02:03
  • Thanks, I have added the third loop into the second loop but still having the 0 search items. I have updated the master code with the new block of codes. – bd_math_genius Jul 19 '19 at 17:21
  • I just run the code with Python 2 versions (with changing "urllib.request as urllib2" to "urllib2") and it gives me the search results. Really confused why with this version of code is not running into 3.7. Appreciate your help. thnx – bd_math_genius Jul 19 '19 at 18:11

1 Answers1

0

A simplified version of your code seems to work in Python 3.7 with the requests library:

import requests
url = 'https://www.sec.gov/Archives/edgar/data/1018724/0001193125-13-028520.txt'
response = requests.get(url)

words = [your word list above ]


count = {}  # is a dictionary data structure in Python
for elem in words:
    count[elem] = 0
    info = str(response.content)
    count[elem] = count[elem] + info.count(elem)


print(count)

Output:

    {'anticipate': 9, 'believe': 32, 'depend': 39, 'fluctuate': 4, 'indefinite': 15, 'likelihood': 15, 'possible': 25,
 'predict': 6, 'risk': 55, 'uncertain': 38}
Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45
  • Thanks a lot Jack..appreciate your help. Any idea how can I import the requests library in 3.5 – bd_math_genius Jul 22 '19 at 22:17
  • @bd_math_genius - requests is supported in 3.5 so `import requests` (after `pip install requests`) should work. And don't forget to accept the answer. – Jack Fleeting Jul 22 '19 at 23:21
  • thnx.pip install request is showing "File "", line 1 pip install requests" – bd_math_genius Jul 23 '19 at 22:20
  • @bd_math_genius Are you using Jupyter Notebooks? If so, type in a cell `!pip install requests` (with the exclamation mark and in plural). If not, I have no idea why it shows the error. – Jack Fleeting Jul 23 '19 at 23:42