LightGBMError: Do not support special JSON characters in feature name - The same code is working in jupyter but doesn't work in Spyder

Question

I have the following code:

    most_important = features_importance_chi(importance_score_tresh, 
    df_user.drop(columns = 'CHURN'),churn)
    X = df_user.drop(columns = 'CHURN')
    churn[churn==2] = 1
    y = churn

    # handle undersample problem
    X,y = handle_undersampe(X,y)

    # train the model

    X=X.loc[:,X.columns.isin(most_important)].values
    y=y.values

    parameters = {
    'application': 'binary',
    'objective': 'binary',
    'metric': 'auc',
    'is_unbalance': 'true',
    'boosting': 'gbdt',
    'num_leaves': 31,
    'feature_fraction': 0.5,
    'bagging_fraction': 0.5,
    'bagging_freq': 20,
    'learning_rate': 0.05,
    'verbose': 0
    }

    # split data
    x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

    train_data = lightgbm.Dataset(x_train, label=y_train)
    test_data = lightgbm.Dataset(x_test, label=y_test)
    model = lightgbm.train(parameters,
                       train_data,
                       valid_sets=[train_data, test_data], 
                       **feature_name=most_important,**
                       num_boost_round=5000,
                       early_stopping_rounds=100)

and function which returns most_important parameter

def features_importance_chi(importance_score_tresh, X, Y):
    model = ExtraTreesClassifier(n_estimators=10)
    model.fit(X,Y.values.ravel())
    feature_list = pd.Series(model.feature_importances_,
                             index=X.columns)
    feature_list = feature_list[feature_list > importance_score_tresh]
    feature_list = feature_list.index.values.tolist()
    return feature_list

Funny thing is that this code in Spyder returns the following error

LightGBMError: Do not support special JSON characters in feature name.

but in jupyter works fine. I am able to print the list of most important features.

Any idea what could be the reason for this error?

I think you are forgetting some code about data frames columns to use data to both datasets `(train+test)`. Be sure you are not using it just on the test-set — Kasim Ecer, Apr 10 '20 at 06:47

score 33 · Accepted Answer · answered Jun 13 '20 at 19:58

You know what, this message is often found on LGBMClassifier () models, i.e. LGBM. Simply drop this line at the beginning as soon as you upload the data from the pandas and you have a problem with your head:

import re
df = df.rename(columns = lambda x:re.sub('[^A-Za-z0-9_]+', '', x))

score 0 · Answer 2 · answered Nov 08 '22 at 02:00

Here is an alternative answer from LightGBM error special JSON characters in feature name #399

# Change columns names ([LightGBM] Do not support special JSON characters in feature name.)
new_names = {col: re.sub(r'[^A-Za-z0-9_]+', '', col) for col in df.columns}
new_n_list = list(new_names.values())
# [LightGBM] Feature appears more than one time.
new_names = {col: f'{new_col}_{i}' if new_col in new_n_list[:i] else new_col for i, (col, new_col) in enumerate(new_names.items())}
df = df.rename(columns=new_names)

ah bon · Answer 3 · 2023-03-23T09:29:37.450

By searching for the problem, it was found that the feature column name was automatically generated because one_hot was used when processing the classification feature.

In fact, there are special characters such as _ or (), so there will be this error.

It can be realized by installing the older version of lightgbm, as follows:

pip install lightgbm==2.2.3 -i https://pypi.tuna.tsinghua.edu.cn/simple

You can also modify the feature name of the incoming data and so on.

LightGBMError: Do not support special JSON characters in feature name - The same code is working in jupyter but doesn't work in Spyder

3 Answers3