Axon
Collection
A collection of memory, parameter, and computationally efficient models.
•
2 items
•
Updated
•
1
Axon-Pico-3M has been released! We encourage all to check it out for superior syntactically and semantically correct text with 2x less parameters.
Visit this space to test out Axon! View how it chooses tokens and how sampling parameters change the behavior.
Axon-Nano-6M is a small-scale language model built on the novel Axon architecture. It is a recurrent neural network with multi-timescale diagonal memory and local convolution mixing. Unlike Transformers, Axon models have O(1) memory per token during inference, making them efficient for long-context generation, while remaining highly expressive.
| Component | Configuration |
|---|---|
| Parameters | 6,204,032 |
| Layers | 4 |
| Model Dimension | 256 |
| Fast/Slow Split | 64/192 |
| Memory Dimension | 128 |
| Control Dimension | 128 |
| FFN Multiplier | 6× |
| Local Kernel | 5 |
| Vocab Size | 2,048 (BPE) |
| Max Sequence | 256 |
| Group | Size | Forget Range | Input Max | Purpose |
|---|---|---|---|---|
| Fast | 43 | [0.05, 0.90] | 0.75 | Recent context |
| Medium | 43 | [0.01, 0.98] | 0.60 | Medium-term |
| Slow | 42 | [0.001, 0.995] | 0.45 | Long-term memory |
| Step | Train Loss | Val Loss | Val PPL |
|---|---|---|---|
| 500 | 2.82 | 2.74 | 15.51 |
| 1000 | 2.30 | 2.25 | 9.51 |
| 2000 | 2.03 | 1.97 | 7.14 |
| 3000 | 1.90 | 1.85 | 6.38 |
| 4000 | 1.82 | 1.77 | 5.87 |
| 5000 | 1.79 | 1.73 | 5.62 |
The model is slightly undertrained.
=== Sample 1 ===
[INST] I feel so angry about the public spaces we spent with my new dog after that event [/INST] That is disappointing; it can be frustrating when people do not think about the effort we put in. [INST] Yes, it seems like some people do not understand how their actions affect others [/INST] It does seem unkind, especially when they do not share their concerns to others. [INST] I just wish people could think more before they speak up for no good reason [/INST] That is a good wish, sometimes people do not realize the impact of their actions. [INST] It makes me wonder if they ever think about how their actions affect others [/INST] Caring for others is important, yet many still prefer
=== Sample 2 ===
[INST] How about we talk about the weather today because it is sunny and warm? [/INST] It is great to see you too, the warm sun makes everything bright. [INST] Yes, the air feels better; I love sunny days when it is sunny. [/INST] Sailing on the water really helps us feel refreshed and happy. [INST] What is your favorite part of sunny days to stay positive? [/INST] I enjoy reading books, especially when the sun is shining brightly outside. [INST] That sounds nice, I also like wearing my favorite spoon in my garden. [/INST] Warm milk always brings joy to everyone, do you enjoy spending time with nature?
<|endoftext
=== Sample 3 ===
[INST] Hello, I feel a bit sad today because I heard some false stories I heard about me. [/INST] I understand, those stories can sometimes make us feel fearful or sad. [INST] Yes, it hurts a lot, and I wonder if anyone will ever feel the same way. [/INST] Art can bring peace, but it often seems to come back slowly when we least expect it. [INST] I hope something can bring back some of that sadness and find peace again. [/INST] It is always possible for things to be valued; maybe you will find new ways to share.
<|endoftext|>
[INST] Why do people find it so disgusting when I just sneez
=== Sample 4 ===
[INST] Hello, I feel a bit sad today because everything seems so cold and gloomy outside. [/INST] It is understandable to feel that way when the days seem to pass so quickly. [INST] I worry about how bad the weather can make us feel gloomy sometimes. [/INST] Yes, the current weather can often impact our mood and energy levels. [INST] Do you think the rain will stop soon, or will it stay like this? [/INST] I hope it does, it always makes everything bright and cheerful for outdoor fun. [INST] I hope we can go outside again soon, it would be wonderful outside. [/INST] I also hope for brighter skies, it is important for outdoor fun
=== Sample 5 ===
[INST] I think we should talk about how disgusting the curtain looks for the clock. [/INST] That is true, the colors can look unappealing and gross to many people. [INST] I just cannot stand seeing all that trash like that; it really does not help. [/INST] Yes, it's like the heat is piling up there, making it a mess, which is frustrating. [INST] Do you think we can find a way to stop feeling so dirty and disilled about it? [/INST] It might be wise to find something like that, but it is hard to predict. [INST] True, maybe we can find a way to turn things tidy and clean next time.
import torch
from model import AxonModel, AxonConfig
from tokenizer import BPETokenizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
config = AxonConfig.from_json("config.json")
model = AxonModel(config).to(device)
model.load_state_dict(torch.load("pytorch_model.pt", map_location=device))
model.eval()
tokenizer = BPETokenizer.from_json("tokenizer.json")
prompt = "[INST] "
tokens = tokenizer.encode(prompt)
input_ids = torch.tensor([tokens], device=device)
with torch.no_grad():
for _ in range(100):
logits = model(input_ids)["logits"][:, -1, :]
next_token = torch.argmax(logits, dim=-1, keepdim=True)
input_ids = torch.cat([input_ids, next_token], dim=1)
output = tokenizer.decode(input_ids[0].tolist())
print(output)
from generation import generate
output = generate(
model=model,
tokenizer=tokenizer,
prompt="[INST] ",
max_new_tokens=200,
temperature=0.8,
top_p=0.9,
device=device,
)
print(output)
This model is intended for:
Not intended for:
@misc{axon2025,
title={Axon: Multi-Timescale Diagonal Memory for Efficient Sequence Modeling},
author={Oscar Lo},
year={2025},
url={https://huggingface.co/oscar128372/axon-tinychat-6M}
}
MIT