FlameF0X/i3-Series
Chat with the i3 model series
Note: The models are listed in the default order set by Hugging Face, so the latest model appears at the bottom.
Chat with the i3 model series
Note Our first usable i3 model (meaning that we added Transformers support and some code for it)
Note Smol stable text generator that took over 14 hours to pre-train :) --- Changes --- Trained on over 1T tokens LoRPt layers
Note Previous SOTA model. Pre-trained in around 2 to 4 hours, in comparison with the previous version of over 14 hours. --- Changes --- Trained on over 3T tokens Other stuff available to read in the model card.
Note Our newest SOTA model. Having ~200m parameters. Trained on the same dataset as the previous model + wikitext. The training took ~1-2 hours. --- Training --- Loss: ~1 PPL: ~3