Thatâs certainly confusingâŠ
You can understand what youâre seeing if you separate three different layers:
- The modelâs vocabulary (all token IDs, including your
added_tokens list).
- The low-level fast tokenizer (Rust
tokenizers library; this is what tokenizer.json stores).
- The high-level Transformers wrapper (Python
PreTrainedTokenizerFast / SpecialTokensMixin; this is driven by tokenizer_config.json and special_tokens_map.json and is what all_special_tokens comes from). (Hugging Face)
Your confusion is exactly because (2) and (3) use the word âspecialâ differently.
1. First: what is a âspecial tokenâ conceptually?
At the model/training level there are two broad kinds of tokens:
-
Normal (regular) tokens â subword pieces of natural language (âHelloâ, âingâ, etc.).
-
Control / format tokens â tokens with special meaning in the training data, such as:
<|im_start|>, <|im_end|> â chat message boundaries.
<|vision_start|>, <|vision_end|>, <|vision_pad|> â multimodal boundaries/padding.
<tool_call>, </tool_call>, <tool_response>, </tool_response> â function-calling tags.
<think>, </think> â reasoning spans.
<|fim_prefix|>, <|fim_middle|>, <|fim_suffix|> â fill-in-the-middle tokens.
<|endoftext|> â end-of-document token used in pretraining.
Qwenâs docs call these âcontrol tokensâ: tokens that represent special functionality rather than natural language itself. (Qwen)
From the modelâs point of view, all of these are just token IDs. âSpecialnessâ is about how the tokenizer and high-level library treat them.
2. What your added_tokens list in tokenizer.json actually is
The tokenizer.json file is the serialized fast tokenizer from the tokenizers library. It contains: vocabulary, merges, pre/post-processing, plus a list called added_tokens. (Hugging Face)
Your added_tokens snippet:
151643 <|endoftext|> true
151644 <|im_start|> true
151645 <|im_end|> true
151646 <|object_ref_start|> true
151647 <|object_ref_end|> true
151648 <|box_start|> true
151649 <|box_end|> true
151650 <|quad_start|> true
151651 <|quad_end|> true
151652 <|vision_start|> true
151653 <|vision_end|> true
151654 <|vision_pad|> true
151655 <|image_pad|> true
151656 <|video_pad|> true
151657 <tool_call> false
151658 </tool_call> false
151659 <|fim_prefix|> false
151660 <|fim_middle|> false
151661 <|fim_suffix|> false
151662 <|fim_pad|> false
151663 <|repo_name|> false
151664 <|file_sep|> false
151665 <tool_response> false
151666 </tool_response> false
151667 <think> false
151668 </think> false
Here:
- The first column is the token ID.
- The middle column is the string form of the token.
- The last
true/false is the Rust-tokenizer-level special flag. (paddlenlp.readthedocs.io)
What that flag does in the fast tokenizer:
-
special = true
- The token is treated as an indivisible âadded tokenâ.
- The pre-tokenizer will not split it into smaller pieces.
- When you decode with
skip_special_tokens=True, these tokens will be removed. (Hugging Face)
-
special = false
- The token is just an extra vocab token. It may still be one piece, but it does not get special handling in the tokenizerâs decode / skip logic.
So:
What does the added_tokens list mean?
It is âall vocabulary items that were added on top of the base vocabâ, along with a low-level special flag that controls how the fast tokenizer tokenizes/decodes them.
It is not âthe list of all special tokens from Transformersâ point of viewâ.
You can see this design in the Transformers code: higher-level add_special_tokens() calls down into the fast tokenizer and creates AddedToken objects with special=True, but there can also be added tokens that are not special. (gemfury.com)
3. What tokenizer_config.json is doing
tokenizer_config.json is a wrapper configuration used by the Python transformers library. It does not contain the full vocab; it tells AutoTokenizer:
-
Which tokenizer class to instantiate ("tokenizer_class": "Qwen2Tokenizer").
-
Which tokens are:
bos_token, eos_token, pad_token, unk_token, etc.
additional_special_tokens (custom special tokens).
-
Behavior flags like model_max_length, padding_side, add_prefix_space, etc. (Hugging Face)
Your tokenizer_config.json says:
"eos_token": "<|im_end|>",
"pad_token": "<|vision_pad|>",
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
]
So from Transformersâ perspective:
- EOS =
<|im_end|>
- PAD =
<|vision_pad|>
- And these 13 tokens are âadditional special tokensâ.
This information is also mirrored in special_tokens_map.json for many models, and both files are loaded by AutoTokenizer. (Hugging Face)
4. How tokenizer.all_special_tokens is computed
In the Transformers Python code, the SpecialTokensMixin class holds all the special-token attributes and exposes properties like all_special_tokens and all_special_ids. (Hugging Face)
Conceptually it does something like:
specials = []
for v in tokenizer.special_tokens_map_extended.values():
if isinstance(v, list):
specials.extend(v)
else:
specials.append(v)
# deduplicate while preserving order
all_special_tokens = list(dict.fromkeys(specials))
Where special_tokens_map_extended is built from:
bos_token, eos_token, pad_token, unk_token, etc.
additional_special_tokens (and sometimes their legacy variants). (Hugging Face)
Crucially:
all_special_tokens never looks at the raw added_tokens list in tokenizer.json.
It only looks at named special tokens (bos_token, eos_token, pad_token, etc.) and additional_special_tokens stored in the config.
That is exactly why your all_special_tokens output is:
[
'<|im_end|>',
'<|vision_pad|>',
'<|im_start|>',
'<|object_ref_start|>',
'<|object_ref_end|>',
'<|box_start|>',
'<|box_end|>',
'<|quad_start|>',
'<|quad_end|>',
'<|vision_start|>',
'<|vision_end|>',
'<|image_pad|>',
'<|video_pad|>',
]
This is just:
eos_token (<|im_end|>)
pad_token (<|vision_pad|>)
- plus everything in
additional_special_tokens (deduplicated).
Notice:
<|endoftext|> is not in additional_special_tokens and is not declared as EOS in tokenizer_config.json.
- Tool / FIM /
<think> tokens are also not in additional_special_tokens and have special=false at the tokenizer level.
Therefore they do not appear in all_special_tokens. This is normal and also shows up in other models (e.g. LLaVAâs <image> token sometimes appears in added_tokens but not in all_special_tokens unless it was wired into additional_special_tokens). (Hugging Face Forums)
So:
Why is all_special_tokens different from the added_tokens list and from the true subset of it?
Because all_special_tokens is a higher-level view built from tokenizer_config.json (special-tokens map and additional_special_tokens), while added_tokens is the raw vocabulary list (with a low-level special flag). They are related but intentionally not the same set.
5. Relationship between the three things you see
Letâs put your exact objects side-by-side.
5.1. added_tokens (fast tokenizer, low-level)
-
Contains all tokens that were added after the base vocab, including:
- Qwen control tokens:
<|endoftext|>, <|im_start|>, <|im_end|>, <|vision_*|>, etc.
- Tool tokens:
<tool_call>, <tool_response>, <think>, etc.
- FIM / repo tokens:
<|fim_*|>, <|repo_name|>, <|file_sep|>.
-
The trailing true/false is the Rust-layer âspecialâ flag for tokenization behavior.
5.2. tokenizer_config.json (Transformers wrapper, high-level)
Defines:
eos_token = "<|im_end|>"
pad_token = "<|vision_pad|>"
additional_special_tokens = the 13 multimodal/chat tokens.
These become:
tokenizer.eos_token, tokenizer.pad_token
tokenizer.additional_special_tokens
and then feed into:
tokenizer.all_special_tokens
tokenizer.all_special_ids
via SpecialTokensMixin. (Hugging Face)
5.3. tokenizer.all_special_tokens (Python view)
- Computed from
special_tokens_map / special_tokens_map_extended (EOS, PAD, additional specials, etc.), not from the raw added_tokens list.
Hence you only see:
<|im_end|>
<|vision_pad|>
- and the 11 other additional special tokens.
<|endoftext|> and <tool_call> are not in that config, so they donât appear even though they exist in added_tokens.
6. Difference in roles: tokenizer.json vs tokenizer_config.json
You can think of it like this:
6.1 tokenizer.json = âhow to actually tokenize textâ
If you change this file, you are changing how raw text is split into IDs.
6.2 tokenizer_config.json = âhow Transformers should treat this tokenizerâ
If you change this file, you are changing metadata and behavior inside Transformers, not the raw tokenization algorithm.
6.3 Other ancillary files
Many HF model repos also contain:
special_tokens_map.json â basically the same info as the special_tokens_map attribute: mapping from names (eos_token, pad_token, additional_special_tokens) to actual strings. (Hugging Face)
added_tokens.json â a separate, simpler listing of added tokens (often derived from tokenizer.json).
config.json / generation_config.json â model config and default generation parameters, including eos_token_id, pad_token_id which must be consistent with the tokenizer side. (Hugging Face)
When these files get out of sync (e.g. EOS ID in config.json vs EOS string in tokenizer_config.json vs tokenizer.json contents), you get classic bugs: generation not stopping, NaNs during training, etc. There are real Qwen bugs like this discussed in the wild.
7. How to mentally understand special tokens in practice
A practical mental model that matches what you see:
-
Vocabulary-level view (tokenizer.json / added_tokens)
- âWhich strings exist as single tokens?â
- âDoes the fast tokenizer treat them as special (never split, removable on decode)?â
-
Transformers-level view (tokenizer_config.json / special_tokens_map.json)
- âWhich tokens does Transformers treat as EOS/PAD/BOS/CLS/SEP?â
- âWhich tokens are additional special tokens (
additional_special_tokens)?â
- This drives
all_special_tokens, all_special_ids, skip_special_tokens=True, etc. (Hugging Face)
-
Model/training-level view (chat template, data format)
-
âWhich control tokens actually appear in the training data, and what do they mean?â
-
Qwen-style control tokens:
<|im_start|>, <|im_end|> â chat roles.
<|vision_*|>, <|image_pad|>, <|video_pad|> â multimodal.
<tool_call>, <tool_response>, <think> â tool + reasoning. (Qwen)
These three layers do not have to use the same subset of tokens, but they must be coherent for your use case.
For your specific tokenizer:
tokenizer.json lists all of those control tokens in added_tokens.
tokenizer_config.json chooses a subset as EOS / PAD / additional special tokens (mostly chat + vision).
tokenizer.all_special_tokens is the union of EOS/PAD plus additional_special_tokens, hence the 13-token list youâre seeing.
8. Summary
9. Good reference links (clickable via citations)
A few high-quality references you can read in full:
- Hugging Face docs â Tokenizer (high-level) and Fast tokenizers (low-level internals). (Hugging Face)
- Transformers internals â
PreTrainedTokenizerBase / SpecialTokensMixin (how special tokens and all_special_tokens are implemented). (Hugging Face)
- Qwen docs â Key Concepts (explains regular vs control tokens in the Qwen family). (Qwen)
- HF forum thread â â
additional_special_tokens are not addedâ (LLaVA <image> token missing from all_special_tokens, same pattern as your issue). (Hugging Face Forums)
- Example tokenizer configs â Qwen2-VL
tokenizer_config.json (shows how Qwen actually declares EOS/PAD and additional special tokens). (Hugging Face)