0

I have loaded my csv file with pandas I have displayed the Dataframe, I have also accessed the Dataframe information.I have been able to access the first column by name the rest of the columns are showing an error message.


import pandas as pd
from matplotlib import pyplot as plt
df = pd.read_csv('cali_avocados.csv')
df.head()
    Year    Commodity Code  Crop Name   County Code County  Harvested Acres Yield   Production  Price P/U   Unit    Value
0   2020    221999  AVOCADOS ALL    53  Monterey    223 5.56    1240    2379.84 Tons    2951000
1   2020    221999  AVOCADOS ALL    65  Riverside   3020                Tons    88697000
2   2020    221999  AVOCADOS ALL    71  San Bernardino  370 2.16    799 2617.02 Tons    2091000
3   2020    221999  AVOCADOS ALL    73  San Diego   14400   3.51    50500   3028.87 Tons    152958000
4   2020    221999  AVOCADOS ALL    79  San Luis Obispo 4240    5.9 25000   1886.76 Tons    47169000
df['Year']
0      2020
1      2020
2      2020
3      2020
4      2020
       ... 
415    1980
416    1980
417    1980
418    1980
419    1980
Name: Year, Length: 420, dtype: int64

When I try to access any other column in the data set, I get an error

df['Production']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
C:\JupyterLab\resources\jlab_server\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted key)
   3362             except KeyError as err:


pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()


KeyError: 'Production'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)

C:\JupyterLab\resources\jlab_server\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3361                 return self._engine.get_loc(casted key)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 
   3365         if is scalar(key) and isna(key) and not self.hasnans:

KeyError: 'Production'
coja56
  • 25
  • 5
  • 3
    What does `df.columns` return? – ozacha Nov 07 '22 at 10:30
  • I would like to be able to pull columns by name ```df['production']``` – coja56 Nov 07 '22 at 10:49
  • It could be that there are some whitespaces in the columns' names. Please report what `df.columns` returns, as @ozacha suggested. – Alessandro Nov 07 '22 at 11:03
  • ```df.columns Index(['Year', ' Commodity Code', ' Crop Name', ' County Code', ' County', ' Harvested Acres', ' Yield', ' Production', ' Price P/U', ' Unit', ' Value'], dtype='object')``` – coja56 Nov 07 '22 at 11:13
  • 1
    Indeed you have a whitespace before `Production`, `df[' Production']` should work. Or better, you could [remove all the whitespaces in the columns' names](https://stackoverflow.com/q/21606987/17203221) – Alessandro Nov 07 '22 at 11:15
  • Thanks it worked it was the spaces. I have used _strip_. ```df.columns = df.columns.str.strip('_')``` – coja56 Nov 07 '22 at 11:35

0 Answers0