0

large language models generate a list of tokens, but to show these tokens to a user we need to add spaces between these tokens, so how do we know where to put these spaces to show an appropriate text?

example this text was tokenized using OpenAI tokenizer, enter image description here

in the text "and (2) it doesn't support GPT-4" how did we know we should add a space before "(" and not before "2" also not before ")"

hakim47
  • 33
  • 3

0 Answers0