Sometimes it is necessary to look at the code:
if text is None and text_target is None:
raise ValueError("You need to specify either `text` or `text_target`.")
if text is not None:
# The context manager will send the inputs as normal texts and not text_target, but we shouldn't change the
# input mode in this case.
if not self._in_target_context_manager:
self._switch_to_input_mode()
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
if text_target is not None:
self._switch_to_target_mode()
target_encodings = self._call_one(text=text_target, text_pair=text_pair_target, **all_kwargs)
# Leave back tokenizer in input mode
self._switch_to_input_mode()
if text_target is None:
return encodings
elif text is None:
return target_encodings
else:
encodings["labels"] = target_encodings["input_ids"]
return encodings
As you can see in the above snippet, both text
and text_target
are passed to self._call_one()
to encode them (note that text_target
is passed as the text
parameter). That means the encoding of the same string as text
or text_target
will be identical as long as _switch_to_target_mode()
doesn't do anything special.
The conditions at the end of the function answer your question:
- When you only provide
text
you will retrieve the encoding of it.
- When you only provide
text_target
you will retrieve the encoding of it.
- When you provide
text
and text_target
you will retrieve the encoding of text
and the token ids of text_target
as the value of the labels
key.
To be honest, I think the implementation is a bit unintuitive. I would expect that passing the text_target
would return an object that only contains the labels
key. I assume that they wanted to keep their output objects and the respective documentation simple and therefore went for this implementation. Or there is a model where it actually makes sense that I am unaware of.