Hi Hugging Face Community,
I have the following questions regarding special tokens:
-
Why doesn’t
tokenizer.all_special_tokensinclude<image>token? I’m using the LLaVA model that has<image>as a special token (as defined inadded_tokens_decoderof tokenizer_config.json). The tokenizer encodes and decodes it indeed as a special token. However, when I load in its tokenizer and calltokenizer.all_special_tokensortokenizer.additional_special_tokens,<image>token is not included. -
Where is
<image>token loaded? I looked into thetokenizer.from_pretrainedfunction but there doesn’t seem to be a place to actually read in theadded_tokens_decoderwhere in config file this special token is defined ? -
Where is
tokenizer.decodefunction defined as a special token? I tried to break-point into it to find how it skipped<image>as a special token but I seem to get into a loop call betweentokenization_utils_base.pyandtokenization_utils_fast.py
It would be really helpful if you give an answer to any of these questions. Thank you very much!