1

I recently began working on a project that uses Stocker (an API that runs off of fbprophet to do machine learning stuff with stock data). I love the simplicity of the API but it has a fatal flaw. It uses quandl to receive its stock data. Quandl stopped updating their data sometime in 2018 and it is impossible to run accurate data models when you are using old data. I looked into the Stocker code and, to my knowledge, it only uses quandl for one line which is

stock = quandl.get('%s/%s' % (exchange, ticker))

This line in quandl returns the data of the stock as a pandas dataframe. I figured since that is all quandl is used for I could just write my own type of quandl that gets the data from a different source (IEX) and return it as a DataFrame. I wrote the code (attached below) but keep getting this error when creating models in stocker:

  File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'Date'

I am pretty lost on this one and am not that familiar with pandas. Any help is much appreciated!

Relevant portion of Stocker that shows the use of quandly to obtain stock data

# Quandl for financial analysis, pandas and numpy for data manipulation
# fbprophet for additive models, #pytrends for Google trend data
#import quandl
import stockdata
import pandas as pd
import numpy as np
import fbprophet
import pytrends
from pytrends.request import TrendReq

# matplotlib pyplot for plotting
import matplotlib.pyplot as plt

import matplotlib

# Class for analyzing and (attempting) to predict future prices
# Contains a number of visualizations and analysis methods
class Stocker():

    # Initialization requires a ticker symbol
    def __init__(self, ticker, exchange='IEX'):

        # Enforce capitalization
        ticker = ticker.upper()

        # Symbol is used for labeling plots
        self.symbol = ticker

        # Use Personal Api Key
        # quandl.ApiConfig.api_key = 'YourKeyHere'

        # Retrieval the financial data
        try:
            stock = stockdata.get(ticker)
            print(stock)

        except Exception as e:
            print('Error Retrieving Data.')
            print(e)
            return

        # Set the index to a column called Date
        stock = stock.reset_index(level=0)

        # Columns required for prophet
        stock['ds'] = stock['Date']

        if ('Adj. Close' not in stock.columns):
            stock['Adj. Close'] = stock['Close']
            stock['Adj. Open'] = stock['Open']

        stock['y'] = stock['Adj. Close']
        stock['Daily Change'] = stock['Adj. Close'] - stock['Adj. Open']

        # Data assigned as class attribute
        self.stock = stock.copy()

        # Minimum and maximum date in range
        self.min_date = min(stock['Date'])
        self.max_date = max(stock['Date'])

        # Find max and min prices and dates on which they occurred
        self.max_price = np.max(self.stock['y'])
        self.min_price = np.min(self.stock['y'])

        self.min_price_date = self.stock[self.stock['y'] == self.min_price]['Date']
        self.min_price_date = self.min_price_date[self.min_price_date.index[0]]
        self.max_price_date = self.stock[self.stock['y'] == self.max_price]['Date']
        self.max_price_date = self.max_price_date[self.max_price_date.index[0]]

        # The starting price (starting with the opening price)
        self.starting_price = float(self.stock.ix[0, 'Adj. Open'])

        # The most recent price
        self.most_recent_price = float(self.stock.ix[len(self.stock) - 1, 'y'])

        # Whether or not to round dates
        self.round_dates = True

        # Number of years of data to train on
        self.training_years = 3

        # Prophet parameters
        # Default prior from library
        self.changepoint_prior_scale = 0.05 
        self.weekly_seasonality = False
        self.daily_seasonality = False
        self.monthly_seasonality = True
        self.yearly_seasonality = True
        self.changepoints = None

        print('{} Stocker Initialized. Data covers {} to {}.'.format(self.symbol,
                                                                     self.min_date.date(),
                                                                     self.max_date.date()))

Quandl's get function

def get(dataset, **kwargs):
    """Return dataframe of requested dataset from Quandl.
    :param dataset: str or list, depending on single dataset usage or multiset usage
            Dataset codes are available on the Quandl website
    :param str api_key: Downloads are limited to 50 unless api_key is specified
    :param str start_date, end_date: Optional datefilers, otherwise entire
           dataset is returned
    :param str collapse: Options are daily, weekly, monthly, quarterly, annual
    :param str transform: options are diff, rdiff, cumul, and normalize
    :param int rows: Number of rows which will be returned
    :param str order: options are asc, desc. Default: `asc`
    :param str returns: specify what format you wish your dataset returned as,
        either `numpy` for a numpy ndarray or `pandas`. Default: `pandas`
    :returns: :class:`pandas.DataFrame` or :class:`numpy.ndarray`
    Note that Pandas expects timeseries data to be sorted ascending for most
    timeseries functionality to work.
    Any other `kwargs` passed to `get` are sent as field/value params to Quandl
    with no interference.
    """

    _convert_params_to_v3(kwargs)

    data_format = kwargs.pop('returns', 'pandas')

    ApiKeyUtil.init_api_key_from_args(kwargs)

    # Check whether dataset is given as a string
    # (for a single dataset) or an array (for a multiset call)

    # Unicode String
    if isinstance(dataset, string_types):
        dataset_args = _parse_dataset_code(dataset)
        if dataset_args['column_index'] is not None:
            kwargs.update({'column_index': dataset_args['column_index']})
        data = Dataset(dataset_args['code']).data(params=kwargs, handle_column_not_found=True)
    # Array
    elif isinstance(dataset, list):
        args = _build_merged_dataset_args(dataset)
        # handle_not_found_error if set to True will add an empty DataFrame
        # for a non-existent dataset instead of raising an error
        data = MergedDataset(args).data(params=kwargs,
                                        handle_not_found_error=True,
                                        handle_column_not_found=True)
    # If wrong format
    else:
        raise InvalidRequestError(Message.ERROR_DATASET_FORMAT)

    if data_format == 'numpy':
        return data.to_numpy()
    return data.to_pandas()


def _parse_dataset_code(dataset):
    if '.' not in dataset:
        return {'code': dataset, 'column_index': None}
    dataset_temp = dataset.split('.')
    if not dataset_temp[1].isdigit():
        raise ValueError(Message.ERROR_COLUMN_INDEX_TYPE % dataset)
    return {'code': dataset_temp[0], 'column_index': int(dataset_temp[1])}

My ghetto get function

import pandas_datareader.data as web
from datetime import date, timedelta

start = date.today()-timedelta(days=1080)
end = date.today()

def get(ticker):
    df = web.DataReader(name=ticker.upper(), data_source='iex', start=start, end=end)
    return df

1 Answers1

0

The problem depends on columns and columns names returned by Quandl and IEX.

Quandl returns:

Date Open High Low Close Volume Ex-Dividend Split Ratio Adj. Open   Adj. High   Adj. Low    Adj. Close  Adj. Volume

while IEX returns:

date open high low close volume

IEX returns adjusted prices so you can map, for instance, IEX 'close' column to Quandl 'Adj. Close'

So, if you want to go with Stocker format (Quandl format), you could create needed columns like this:

# >-Quandl format-<    >-- IEX --<
stock['Adj. Close']  = stock['close']
stock['Date']        = stock ['date']

etc...

Beware that you'll probably need to convert string date from IEX to datetime format

GDN
  • 196
  • 1
  • 8