I am trying to extract business descriptions of multiple firms from their 10-K reports using the R package, edgar
. I am using getBusinDescr
function to do so.
As I want business descriptions of many firms (1000+), I created a vector of firms' cik identifier and let R download descriptions of 1000+ firms using the vector. The problem is, R perfectly downloads fillings I want (10-K reports) while it fails to extract the section I am interested in. It stopped at 61% for year 2007 and at 31% for year 2011. However, for year 2010, the extraction worked out 100%.
To sum up, the extraction works for certain years but does not work for other years. I am curious to know where this error comes from. Do you think it is because of data availability (i.e., certain firms do not have business description for some years) or some natural errors from repeated scraping attempts? Please help me interpret and hopefully deal with the error.
Just fyi, I am using the latest R on my Mac.
The code I use is:
# using edgar package on R
library(edgar)
# cikvector is a vector of multiple firms' identifier codes
# for year 2007
- filings.BusinDes.2007 <- getBusinDescr( cik.no=cikvector, filing.year=2007)
# for year 2008
filings.BusinDes.2008 <- getBusinDescr( cik.no=cikvector, filing.year=2008)
The ideal results are as follows:
Downloading fillings. Please wait...
100%
Extracting 'Item 1' section...
100%
Business descriptions are stored in 'Business descriptions text' directory.
The error I encounter is as follows (Downloading the whole reports is done without any problem, though):
Downloading fillings. Please wait...
100%
Extracting 'Item 1' section...
**| 31%Error in (grep("<DOCUMENT>", filing.text, ignore.case = TRUE)[1]): (grep("</DOCUMENT>", :
NA/NaN argument**