0

I made a simple program that uses the Google Finance API to grab stock data through HTTP requests and does some calculations on them.

The google-api looks like this(adds a new block of data every minute during trading hours):

https://www.google.com/finance/getprices?i=60&p=0d&f=d,o,h,l,c,v&df=cpct&q=AAPL

This works fine, however I have a huge list of stock-tickers I need to get data from. In order to loop through them without hitting a request limit I set a time interval of 2 seconds between the requests. There's over 5000 stocks, so this takes forever and I need it to get done in < 5 minutes in order for the algorithm to be useful.

I was wondering if there is a way to achieve this with HTTP requests? Or if I'm tackling this the wrong way. I can't download the data beforehand to do it on the client-side as I need to get the data as soon as the first quotes come out in the morning.

Programmed in JavaScript (nodejs), but answers in any language is fine. Here's the function that I call with 2 second intervals:

var getStockData = function(ticker, day, cb){
    var http = require('http');
    var options = {
        host: "www.google.com",
        path: "/"
    };

    ticker = ticker.replace(/\s+/g, '');

    var data = '';
    options.path = "/finance/getprices?i=60&p=" +day+"d&f=d,o,h,l,c,v&df=cpct&q=" + ticker;

    var callback = function(response){
        response.on('data', function(chunk){
            data +=chunk;
        });

        response.on('end', function(){
            var data_clean = cleanUp(data);
            if(data_clean === -1) console.log('we couldnt find anything for this ticker');

            cb(data_clean);
        })
    };

    http.request(options, callback).end();

};

Any help would be greatly appreciated.

brzei
  • 1
  • 2

1 Answers1

0

If designing against a certain API
with policy threshold ( refresh-rate ceiling, bandwidth limit, etc. )

  1. avoid refetching data the node has already received

using the as-is URL above, a huge block of data is being (re)-fetched, most rows of which, if not all, were already known from an "identical URL" call just 2 seconds before:

EXCHANGE%3DNASDAQ
MARKET_OPEN_MINUTE=570
MARKET_CLOSE_MINUTE=960
INTERVAL=60
COLUMNS=DATE,CLOSE,HIGH,LOW,OPEN,VOLUME
DATA=
TIMEZONE_OFFSET=-300
a1482330600,116.84,116.84,116.8,116.8,225329
1,116.99,117,116.8,116.84,81304
2,117.26,117.28,116.99,117,225262
3,117.32,117.35,117.205,117.28,153225
4,117.28,117.33,117.22,117.32,104072
.
..
...
..
.
149,116.98,117,116.98,116.98,8175
150,116.994,117,116.98,116.99,2751
151,117,117.005,116.9901,116.9937,7774
152,117.01,117.02,116.99,116.995,13011
153,117.0199,117.02,117.005,117.02,9313
  1. review carefully API-specifications to send smarter requests, yielding minimum-footprint data
  2. watch API End-Of-Life signals, to find another source before API stops provisioning data

(cit.:) The Google Finance APIs are no longer available. Thank you for your interest.

As noted below, in the second comment, the inherent inefficiency of re-fetching repetitively the growing block of already known data is to be avoided.

A professional DataPump design ought use API details for doing this:

  • adding ts=1482330600 aTimeSTAMP ( unix-format [s] ) to define a start of "new" data to be retrieved, leaving those already seen before the time-stamp, out of the transmitted block.
user3666197
  • 1
  • 6
  • 50
  • 92
  • Thanks for the tip! But It's only the "INTERVAL" part that is identical on each request. The rows of numbers are the stock-price data throughout the day for that individual stock. – brzei Dec 21 '16 at 23:38
  • O,H,L,C,V data are "frozen" in time, meaning, these values never change after the administrative time fixes the actual Bar. This means, re-fetching "old", already seen data in subsequent URL calls is inefficient. Continuous Data-pumps work on incremental OHLCV-updates, bringing just the last / actual Bar data, avoiding the congestion w/o touching the policy ceilings. Re-fetching actual Bar's data just for the Close is also a poor practice, there are streaming Quote's interfaces for the stock and other options ( FIX-Protocol GWY et al ) for getting really hot data on live-updates, DoM, etc. – user3666197 Dec 22 '16 at 03:58
  • Sorry for the inconvenience, but not sure I fully understand. I'm only fetching the block of data once per stock-symbol, so how am I re-fetching already known data? ex. the O,H,L,C,V for 'GOOGL' is different than 'AAPL'. I don't need to stream it / listen for a new row of data, I only need to grab the data once per stock symbol. – brzei Dec 23 '16 at 00:44