Pandas can't find columns, ValueError

Question

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

symbols = ["AAPL", "GLD", "TSLA", "GBL", "GOOGL"]

def compare_security(symbols):
    start_date = "01-01-2019"
    end_date = "01-12-2020"
    dates = pd.date_range(start_date, end_date)

    df1 = pd.DataFrame(index=dates)
    df_SPY = pd.read_csv(
        "https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=SPY&apikey=XXXX&datatype=csv",
        index_col="timestamp", usecols=["timestamp", "adjusted_close"], parse_dates=True, na_values=['nan'])
    df_SPY = df_SPY.rename(columns={"adjusted_close": "SPY"})
    df1 = df1.join(df_SPY, how="inner")

    for symbol in symbols:
        df_temp= pd.read_csv("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol={}&apikey=XXXX&datatype=csv".format(symbol),
            index_col = "timestamp", usecols = ["timestamp", "adjusted_close"], parse_dates=True, na_values=['nan'])
        df_temp = df_temp.rename(columns={"adjusted_close":symbol})
        df1 = df1.join(df_temp)
    return df1

def test_run():
    df = compare_security(symbols)
    print(df)
    df.plot()
    plt.title(symbols)
    plt.show()

if __name__ == "__main__":
    test_run()

It reads the error "ValueError: Usecols do not match columns, columns expected but not found: ['timestamp', 'adjusted_close']"

However, I checked all the files the code would retrieve and all of them have the respective columns. Any clarification as to where I went wrong would be greatly appreciated.

usual suspect: whitespaces. Check if the columns from the csv are being read with spaces. Easy check is to remove the usecols parameter and then do `list(df_SPY)` to see how your column headers are being parsed — Yuca, Jan 13 '20 at 18:18
@Yuca print(list(df_SPY)) yields ['open', 'high', 'low', 'close', 'adjusted_close', 'volume', 'dividend_amount', 'split_coefficient'] — VEQ, Jan 13 '20 at 18:37
as you can see, there's no timestamp being read. So it makes sense that it fails, remove 'timestamp' from `usecols` and it should work — Yuca, Jan 13 '20 at 18:40
I ran your code as is and it worked fine. Could you run: ```pip freeze | grep pandas``` and post here what you get? — Patrick Collins, Jan 13 '20 at 20:51
@Patrick Collins Oddly it has worked for me before as well, it just stopped so I assumed I messed something up and didn't realize it. Here is the error message when I put `pip freeze | grep pandas` into the terminal `>pip freeze|grep pandas 'grep' is not recognized as an internal or external command, operable program or batch file. >Exception ignored in: <_io.TextIOWrapper name='' mode='w' encoding='cp1252'> OSError: [Errno 22] Invalid argument` — VEQ, Jan 13 '20 at 21:01
@Yuca Unfortunately, I have tried removing 'timestamp' and it did not fix it — VEQ, Jan 13 '20 at 21:05
what's the error then? remove all references to timestamp, including `index_col` — Yuca, Jan 13 '20 at 21:08
Ah thanks for the information, just did a few more tests. You're hitting the API limit, I'll respond below — Patrick Collins, Jan 13 '20 at 21:12
it can't be the same as before since you're no longer asking for `timestamp` — Yuca, Jan 13 '20 at 21:24
That's what I was thinking too, however, I think the fact that I was hitting the API limit is the real fault. Thank you for the help though! — VEQ, Jan 13 '20 at 21:27

score 0 · Accepted Answer · answered Jan 13 '20 at 21:20

You're hitting the API limit with a standard key. The standard key is allowed 5 API calls / minute and 500 / day, that's why it works sometimes.

You can see that if you paste your URL into your browser and refresh it 5 - 10 times in 60 seconds you'll manually hit the limit.

You can either:

Upgrade to a premium key.
Space out your API calls (wait 60 seconds after you run this to run it again)

A note on privacy that may also relate to your API threshold hitting. You have publicly shared your API key.

Place your API key in an environment variable
When you post, use "XXXX" or something to that affect as an API key substitute.

If you publicly share your API key others can use it and means someone else could be using your 5 API calls / minute.

Sample:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os

symbols = ["AAPL", "GLD", "TSLA", "GBL", "GOOGL"]


def compare_security(symbols):
    start_date = "01-01-2019"
    end_date = "01-12-2020"
    dates = pd.date_range(start_date, end_date)

    df1 = pd.DataFrame(index=dates)
    df_SPY = pd.read_csv(
        "https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=SPY&apikey={}&datatype=csv".format(
            os.getenv("ALPHAVANTAGE_API_KEY")),
        index_col="timestamp", usecols=["timestamp", "adjusted_close"], parse_dates=True, na_values=['nan'])
    df_SPY = df_SPY.rename(columns={"adjusted_close": "SPY"})
    df1 = df1.join(df_SPY, how="inner")

    for symbol in symbols:

        df_temp = pd.read_csv("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol={}&apikey={}&datatype=csv".format(symbol, os.getenv("ALPHAVANTAGE_API_KEY")),
                              index_col="timestamp", usecols=["timestamp", "adjusted_close"], parse_dates=True, na_values=['nan'])
        df_temp = df_temp.rename(columns={"adjusted_close": symbol})
        df1 = df1.join(df_temp)
    return df1


def test_run():
    df = compare_security(symbols)
    print(df)
    df.plot()
    plt.title(symbols)
    plt.show()


if __name__ == "__main__":
    test_run()

I may have spoke too soon, I got a working key and spaced out my calls but the same error stills occurs. It does however work with local files, so the API is definitely the issue. — VEQ, Jan 13 '20 at 21:38
Try running: https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=SPY&apikey=XXXX in your browser (but replace your api key with the XXXX). What does it tell you in your browser? You might have hit the 500 / day threshold. If you get data back your code is wrong. If you get an error back, what's the error say? — Patrick Collins, Jan 13 '20 at 22:39
I got data back so I went and tried the code again and it worked. I then waited and tried a second time and it failed like before, so I must be hitting the limit one way or another. Thanks again! — VEQ, Jan 14 '20 at 00:14

Lav · Answer 2 · 2020-01-13T19:12:37.247

Please run the below commands in IDLE and you will notice that parse_dates=True is the culprit. Why is it doing so? I don't know but at least you now know whats causing the error.

for symbol in symbols:
    df_temp= pd.read_csv("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol={}&apikey=XX3U125BWQLTU2VZ&datatype=csv".format(symbol))
    print(df_temp.columns)

and

for symbol in symbols:
    df_temp= pd.read_csv("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol={}&apikey=XX3U125BWQLTU2VZ&datatype=csv".format(symbol), parse_dates=True)
    print(df_temp.columns)

I see what you mean, however, I tried running it with parse_dates =True removed and it still results in the same error — VEQ, Jan 13 '20 at 20:44

Pandas can't find columns, ValueError

2 Answers2