ValueError: Index DATE invalid with pandas.read_csv on header row

Question

Trying to create a dictionary with the key's being the first row of the csv file and the value's being a dictionary of {first column: corresponding column to row}:

import pandas as pd

df = pd.read_csv('~/StockMachine/data_stocks.csv', index_col=['DATE'], sep=',\s+')

data = df.to_dict()

print(data)

However, I get this error "ValueError: Index DATE invalid".

Traceback:

  File "/Users/cs/StockMachine/stockmachine.py", line 4, in <module>
    df = pd.read_csv('~/StockMachine/data_stocks.csv', index_col=['DATE'], sep=',\s+')
  File "/Users/cs/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Users/cs/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 446, in _read
    data = parser.read(nrows)
  File "/Users/cs/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1036, in read
    ret = self._engine.read(nrows)
  File "/Users/cs/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 2273, in read
    index, columns = self._make_index(data, alldata, columns, indexnamerow)
  File "/Users/cs/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1425, in _make_index
    index = self._get_simple_index(alldata, columns)
  File "/Users/cs/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1457, in _get_simple_index
    i = ix(idx)
  File "/Users/cs/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1452, in ix
    raise ValueError('Index %s invalid' % col)

data_stocks.csv:

are you sure this `sep=',\s+'` is correct? If the csv is Comma separated, use `sep=','` so the read_csv() function can parse the file properly — Theo, Jul 30 '18 at 07:43
Could you provide a subset of your csv that reproduces the error? — Luca Cappelletti, Sep 29 '19 at 07:06
`sep=',\s+'` is wrong. The separator(/delimiter) is either comma or whitespace - not both. If you specify comma, pandas will correctly ignore whitespace. — smci, Feb 11 '20 at 00:39

score 2 · Answer 1 · answered Jan 22 '20 at 19:20

Similiar thing happened to me and in my case some readings of ['DATE'] were strings with empty spaces inside. Maybe if you would do something like:

import pandas as pd

df = pd.read_csv('~/StockMachine/data_stocks.csv', sep=',\s+')

df['DATE'] = df['DATE'].apply(lambda x: str(x.strip())).astype(str)

df.set_index('DATE', inplace=True)

print(df.head())

score 0 · Answer 2 · answered Feb 02 '22 at 11:39

0

I had same issue then realized it is that the selected column for the Col_Index is not part of the selected Header=1 row specified in my script

answered Feb 02 '22 at 11:39

bassil Aleter

1
1

This does not really answer the question. If you have a different question, you can ask it by clicking [Ask Question](https://stackoverflow.com/questions/ask). To get notified when this question gets new answers, you can [follow this question](https://meta.stackexchange.com/q/345661). Once you have enough [reputation](https://stackoverflow.com/help/whats-reputation), you can also [add a bounty](https://stackoverflow.com/help/privileges/set-bounties) to draw more attention to this question. - [From Review](/review/late-answers/30969856) – taylor.2317 Feb 06 '22 at 21:33

ValueError: Index DATE invalid with pandas.read_csv on header row

2 Answers2