2

I'm doing a stance detection project of brexit tweets using USE. For my code I'm trying to apply this notebook: https://www.kaggle.com/kshitijmohan/sentiment-analysis-universal-sentence-encoder-91

But I get this error:

input must be a vector, got shape: []

(t_stance is the stance, 0 = remain, 1 = leave, 2 = neutral)

RANDOM_SEED = 42

np.random.seed(RANDOM_SEED)
tf.random.set_seed(RANDOM_SEED)

train = pd.read_csv(base_dir + "k500_train.csv")

module_url = "https://tfhub.dev/google/universal-sentence-encoder/4" 
use = hub.load(module_url)

from sklearn.preprocessing import OneHotEncoder

type_one_hot = OneHotEncoder(sparse=False).fit_transform(
  train.t_stance.to_numpy().reshape(-1,1)
)

train_reviews, test_reviews, y_train, y_test =\
  train_test_split(
    train.text, 
    type_one_hot, 
    test_size=.4, 
    random_state=42
  )


X_train = []
for r in tqdm(train_reviews):
  emb = use(r)
  review_emb = tf.reshape(emb, [-1]).numpy()
  X_train.append(review_emb)

X_train = np.array(X_train)
 0%|          | 0/210000 [00:01<?, ?it/s]

InvalidArgumentError:  input must be a vector, got shape: []
     [[{{node StatefulPartitionedCall/StatefulPartitionedCall/text_preprocessor/tokenize/StringSplit/StringSplit}}]] [Op:__inference_restored_function_body_10218]

Function call stack:
restored_function_body 
Red
  • 26,798
  • 7
  • 36
  • 58
Radix
  • 254
  • 1
  • 8
  • There could be problem with the dataset preprocessing of "k500_train.csv" as the kaggle code with mentioned reviews.csv is running successfully. If you can share the dataset (if it is shareable), I can try to fix the error. –  Mar 10 '22 at 12:32

0 Answers0