-1

Why am i getting this error message?

Here are the variables that are included in my code. The columns they include are all dummy variables:

country_cols = wine_dummies.loc[:, 'country_Chile':'country_US']
variety_cols = wine_dummies.loc[:, 'variety_Cabernet 
Sauvignon':'variety_Zinfandel']
pricecat_cols = wine_dummies.loc[:, 'price_category_low':]

Here is the code that is throwing the error (it is throwing the error at "X = wine[feature_cols_1]":

feature_cols_1 = ['price', country_cols, variety_cols, 'year']
feature_cols_2 = [pricecat_cols, country_cols, variety_cols, 'year']

X = wine[feature_cols_1] <---ERROR
y = wine['points']

Here is the head of my dataframe:

country designation points  price   province    variety      year   ... variety_Riesling    variety_Rosé    variety_Sangiovese  variety_Sauvignon Blanc variety_Syrah   variety_Tempranillo variety_White Blend variety_Zinfandel   price_category_low  price_category_med
Portugal    Avidagos    87  15.0    Douro   Portuguese Red  2011.0  ... 0  0    0   0   0   0   0   0   1 0    

^ each dummy variable (0s and 1s) after "..." corresponds to each column after "..."

GuyGuyGuy
  • 75
  • 1
  • 5
  • Did you look at the output of `country_cols = wine_dummies.loc[:, 'country_Chile':'country_US']`? – roganjosh Mar 13 '19 at 21:53
  • yes, it outputs a dataframe that includes those columns with a uint8 dtype – GuyGuyGuy Mar 13 '19 at 21:55
  • columns: 'country_Chile country_France country_Germany country_Italy country_Portugal country_Spain country_US' Row 1: '1 0 0 0 0 1 0 0' – GuyGuyGuy Mar 13 '19 at 21:57
  • Exactly, so when you do `X = wine[feature_cols_1]` you're actually doing `X = wine['price', a_full_df, a full_df, 'year']`. That doesn't make sense, and that's what the error is telling you about. – roganjosh Mar 13 '19 at 21:58
  • thanks, do you know how I can adjust my code so that "feature_cols_1" includes all of the columns in "country_cols, variety_cols, and pricecat_cols?" I'm really just doing this so I don't have to type each column name into the "feature_cols_1" variable – GuyGuyGuy Mar 13 '19 at 22:01
  • It would be easier to help you if you provided a [mcve]... – Serge Ballesta Mar 13 '19 at 22:28

1 Answers1

0

This is actually quite cumbersome, so it's only going to be useful if you have lots of columns between 'country_Chile':'country_US'. In the below example, I'm deliberately dropping the a column in middle_columns by taking the column indices.

This is using pandas.Index.get_loc to find the indices of the start and end columns, which can then be used as a slice on the full list of dataframe columns. Then it unpacks that list using * into the final list of columns.

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3], 'b': [2, 3, 4], 'c': [3, 4, 5], 
                   'd': [4, 5, 6], 'wine': ['happy', 'drunk', 'sad'],
                   'year': [2002, 2003, 2019]})

middle_columns = df.columns[df.columns.get_loc('b'):df.columns.get_loc('d')+1]
all_cols = ['wine', *middle_columns, 'year']
X = df[all_cols]

The reason your current approach doesn't work is that feature_cols_1 = ['price', country_cols, variety_cols, 'year'] returns a list of strings and dataframes, that you then try to use as columns to a second dataframe.

roganjosh
  • 12,594
  • 4
  • 29
  • 46
  • Thanks, I think you're leading me in the right direction. Almost there. I apologize for all of the follow up questions: I used your code, but now I'm getting this error "unhashable type: 'Index'" am i doing something wrong? – GuyGuyGuy Mar 13 '19 at 22:36
  • That sounds almost certainly like a mistranslation of my approach. At a guess, you're missing a `get_loc` but I really can't know how that error comes about – roganjosh Mar 13 '19 at 22:39
  • This worked! Thank you! Yes, it was a mistranslation. I didn't unpack using *. – GuyGuyGuy Mar 13 '19 at 23:02