2

I'm fine-tuning a BERT model using Hugging Face, Keras, Tensorflow libraries.

Since yesterday I'm getting this error running my code in Google Colab. The odd thing is that the code used to run without any problem and suddenly started to throw this error. What is even more suspicious is that the code runs without problems in my Apple M1 tensorflow configuration. Again, I did not change anything to my code, but now the code can't run in Google Colab although it used to run with no problems whatsoever.

Both environments have tensorflow 2.6.0

error_screenshot

I created the code below for reproducibility of the error. I hope you can shed some light on this.

!pip install transformers
!pip install datasets

import pandas as pd
import numpy as np
import tensorflow as tf
from transformers import AutoTokenizer
from datasets import Dataset

# dummy sentences
sentences = ['the house is blue and big', 'this is fun stuff','what a horrible thing to say']

# create a pandas dataframe and converto to Hugging Face dataset
df = pd.DataFrame({'Text': sentences})
dataset = Dataset.from_pandas(df)

#download bert tokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

# tokenize each sentence in dataset
dataset_tok = dataset.map(lambda x: tokenizer(x['Text'], truncation=True, padding=True, max_length=10), batched=True)

# remove original text column and set format
dataset_tok = dataset_tok.remove_columns(['Text']).with_format('tensorflow')

# extract features
features = {x: dataset_tok[x].to_tensor() for x in tokenizer.model_input_names}
AloneTogether
  • 25,814
  • 5
  • 20
  • 39
ipietri
  • 21
  • 1
  • 3
  • Are the versions of tensorflow same on the two environment? – umitu Oct 14 '21 at 22:51
  • Yes. Both environments have tensorflow 2.6.0 – ipietri Oct 15 '21 at 01:42
  • Thanks to `.with_format('tensorflow')` your dataset is already fill with tf tensors. If you expect to get tensor just remove the .to_tensor() or remove `.with_format('tensorflow')` and use `tf.convert_to_tensor(dataset_tok[x])` ? – Harold G Oct 15 '21 at 08:36
  • 1
    Thanks @HaroldG. I removed `to_tensor()` and is running fine. I see now that the statement was redundant. Although that is the procedure suggested in the Hugging Face official documentation (https://huggingface.co/transformers/training.html) and TensorFlow wasn't throwing an error until now. Anyway, I'm glad that is running now. Thanks! – ipietri Oct 15 '21 at 15:27

1 Answers1

1

After removing to_tensor() given code is working as suggested by @Harold G.

!pip install transformers
!pip install datasets

import pandas as pd
import numpy as np
import tensorflow as tf
from transformers import AutoTokenizer
from datasets import Dataset

# dummy sentences
sentences = ['the house is blue and big', 'this is fun stuff','what a horrible thing to say']

# create a pandas dataframe and converto to Hugging Face dataset
df = pd.DataFrame({'Text': sentences})
dataset = Dataset.from_pandas(df)

#download bert tokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

# tokenize each sentence in dataset
dataset_tok = dataset.map(lambda x: tokenizer(x['Text'], truncation=True, padding=True, max_length=10), batched=True)

# remove original text column and set format
dataset_tok = dataset_tok.remove_columns(['Text']).with_format('tensorflow')

# extract features
features = {x: dataset_tok[x] for x in tokenizer.model_input_names}