0

I want to use YQL to retrieve all 10-Q & 10-K files from SEC EDGAR database. After ref to the discussions [1] & [2], I bump into some problem.

It seems that YQL cannot get search results from the search engine.

However, I can directly access the filing detail page.

Here is a jsfiddle shows the problem. Although both queries return success message, the query to the search engine returns a result of empty array.

Is there any other way to get all the html addresses of the detail filing pages without querying EDGAR search engine? Thanks.

Example code by using YQL shows below:

    // results page from EDGAR search engine:
    // fail to get data
    var queryURL = "http://www.sec.gov/cgi-bin/browse-edgar?" +  
     "action=getcompany&CIK=0001326801&type=10-K&dateb=&owner=exclude&count=100";

    // EDGAR 10-K detail filing page:
    // success to fetch by YQL
    var filingURL = "http://www.sec.gov/Archives/edgar/data/1326801/" + 
        "000132680114000007/0001326801-14-000007-index.htm";


    $.get(queryURL).then(function() { 
          // get successful message, but get results of empty array
     })
    .then(function() {
        $.get(filingURL).then(function() {             
            // get successful message, and get results of empty array
       })
    } ) 
Community
  • 1
  • 1
imonet
  • 5
  • 2

1 Answers1

2

The /cgi-bin URL is restricted by robots.txt, so YQL will honour that and not crawl the page.

You can see this happening by enabling diagnostics for the YQL query.

  • Add diagnostics=true to the YQL URL, like /v1/public/yql?diagnostics=true&callback=?
  • Look for the diagnostics field in the results. This contains information about the query and any URLs it visited.

Firebug showing diagnostics

salathe
  • 51,324
  • 12
  • 104
  • 132