0

I have a problem. I got the following error ValueError: if 'bert' is selected model, then preprocess_mode='bert' should be used and vice versa. But I do not see any problem. What is wrong with my code?

%%time
#Importing
import ktrain
from ktrain import text
(x_train_bert, y_train_bert), (x_val_bert, y_val_bert), preproc = text.texts_from_array(
                                                                        x_train=train_X.tolist(), y_train=train_y.tolist(),
                                                                        x_test=test_X.tolist(), y_test=test_y.tolist(),
                                                                        class_names=df_complete['forwarder_name'].unique(),
                                                                        preprocess_mode='bert',
                                                                        lang='en',
                                                                        maxlen=65,
                                                                        max_features=35000)                          


model = text.text_classifier(name='bert', train_data=(train_X, train_y), preproc=preproc)
learner = ktrain.get_learner(model,train_data=(train_X, train_y), val_data=(test_X, test_y), batch_size=6)

Complete error


model = text.text_classifier(name='bert', train_data=(train_X, train_y), preproc=preproc)
#learner = ktrain.get_learner(model,train_data=(train_X, train_y), val_data=(test_X, test_y), batch_size=6)
model = text.text_classifier(name='bert', train_data=(train_X, train_y), preproc=preproc)
#learner = ktrain.get_learner(model,train_data=(train_X, train_y), val_data=(test_X, test_y), batch_size=6)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [111], in <cell line: 1>()
----> 1 model = text.text_classifier(name='bert', train_data=(train_X, train_y), preproc=preproc)

File ~\AppData\Roaming\Python\Python39\site-packages\ktrain\text\models.py:589, in text_classifier(name, train_data, preproc, multilabel, metrics, verbose)
    585 if preproc is not None and not preproc.get_classes():
    586     raise ValueError(
    587         "preproc.get_classes() is empty, but required for text classification"
    588     )
--> 589 return _text_model(
    590     name,
    591     train_data,
    592     preproc=preproc,
    593     multilabel=multilabel,
    594     classification=True,
    595     metrics=metrics,
    596     verbose=verbose,
    597 )

File ~\AppData\Roaming\Python\Python39\site-packages\ktrain\text\models.py:109, in _text_model(name, train_data, preproc, multilabel, classification, metrics, verbose)
    107 is_bert = U.bert_data_tuple(train_data)
    108 if (is_bert and name != BERT) or (not is_bert and name == BERT):
--> 109     raise ValueError(
    110         "if '%s' is selected model, then preprocess_mode='%s' should be used and vice versa"
    111         % (BERT, BERT)
    112     )
    113 is_huggingface = U.is_huggingface(data=train_data)
    114 if (is_huggingface and name not in HUGGINGFACE_MODELS) or (
    115     not is_huggingface and name in HUGGINGFACE_MODELS
    116 ):

ValueError: if 'bert' is selected model, then preprocess_mode='bert' should be used and vice versa
Test
  • 571
  • 13
  • 32
  • did you try with `model = text.text_classifier(name='bert', train_data=(train_X, train_y), preproc='bert')`? – David Jul 06 '22 at 01:12
  • alternatively you have to assign `name` perhaps dynamically too by replacing `*NAME*` with a dynamic value: `model = text.text_classifier(name=*NAME*, train_data=(train_X, train_y), preproc=preproc)` – David Jul 06 '22 at 01:26

1 Answers1

1

There's a typo in your code. The problem is that you're using train_X and train_y (not preprocessed for BERT) instead of x_train_bert and y_train_bert (which were processed for BERT).

Use this instead:

model = text.text_classifier(name='bert', train_data=(x_train_bert, y_train_bert), preproc=preproc)
learner = ktrain.get_learner(model,train_data=(x_train_bert, y_train_bert), val_data=(x_val_bert, y_val_bert), batch_size=6)

blustax
  • 449
  • 3
  • 4