9

I have the following code:

    most_important = features_importance_chi(importance_score_tresh, 
    df_user.drop(columns = 'CHURN'),churn)
    X = df_user.drop(columns = 'CHURN')
    churn[churn==2] = 1
    y = churn

    # handle undersample problem
    X,y = handle_undersampe(X,y)

    # train the model

    X=X.loc[:,X.columns.isin(most_important)].values
    y=y.values

    parameters = {
    'application': 'binary',
    'objective': 'binary',
    'metric': 'auc',
    'is_unbalance': 'true',
    'boosting': 'gbdt',
    'num_leaves': 31,
    'feature_fraction': 0.5,
    'bagging_fraction': 0.5,
    'bagging_freq': 20,
    'learning_rate': 0.05,
    'verbose': 0
    }

    # split data
    x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

    train_data = lightgbm.Dataset(x_train, label=y_train)
    test_data = lightgbm.Dataset(x_test, label=y_test)
    model = lightgbm.train(parameters,
                       train_data,
                       valid_sets=[train_data, test_data], 
                       **feature_name=most_important,**
                       num_boost_round=5000,
                       early_stopping_rounds=100) 

and function which returns most_important parameter

def features_importance_chi(importance_score_tresh, X, Y):
    model = ExtraTreesClassifier(n_estimators=10)
    model.fit(X,Y.values.ravel())
    feature_list = pd.Series(model.feature_importances_,
                             index=X.columns)
    feature_list = feature_list[feature_list > importance_score_tresh]
    feature_list = feature_list.index.values.tolist()
    return feature_list

Funny thing is that this code in Spyder returns the following error

LightGBMError: Do not support special JSON characters in feature name.

but in jupyter works fine. I am able to print the list of most important features.

Any idea what could be the reason for this error?

zdz
  • 307
  • 1
  • 2
  • 9
  • I think you are forgetting some code about data frames columns to use data to both datasets `(train+test)`. Be sure you are not using it just on the test-set – Kasim Ecer Apr 10 '20 at 06:47

3 Answers3

33

You know what, this message is often found on LGBMClassifier () models, i.e. LGBM. Simply drop this line at the beginning as soon as you upload the data from the pandas and you have a problem with your head:

import re
df = df.rename(columns = lambda x:re.sub('[^A-Za-z0-9_]+', '', x))
Wojciech Moszczyński
  • 2,893
  • 21
  • 27
0

Here is an alternative answer from LightGBM error special JSON characters in feature name #399

# Change columns names ([LightGBM] Do not support special JSON characters in feature name.)
new_names = {col: re.sub(r'[^A-Za-z0-9_]+', '', col) for col in df.columns}
new_n_list = list(new_names.values())
# [LightGBM] Feature appears more than one time.
new_names = {col: f'{new_col}_{i}' if new_col in new_n_list[:i] else new_col for i, (col, new_col) in enumerate(new_names.items())}
df = df.rename(columns=new_names)
ah bon
  • 9,293
  • 12
  • 65
  • 148
0

By searching for the problem, it was found that the feature column name was automatically generated because one_hot was used when processing the classification feature.

In fact, there are special characters such as _ or (), so there will be this error.

  1. It can be realized by installing the older version of lightgbm, as follows:

pip install lightgbm==2.2.3 -i https://pypi.tuna.tsinghua.edu.cn/simple

  1. You can also modify the feature name of the incoming data and so on.
ah bon
  • 9,293
  • 12
  • 65
  • 148