I'll show you in a few weeks what a small model is capable of, you will surprised.
Join the conversation
Join the community of Machine Learners and AI enthusiasts.
Sign UpI'll show you in a few weeks what a small model is capable of, you will surprised.
I applaud you in your journey into the void with small models. I too am deeply fascinated with the optimization of smaller models rather than asking for more parameters and terabytes of scraped internet data. I hope to see what you've come up with in a few weeks time.
I just finished designing a sparsity training scheduler that trains on average 35% of a models available weights with almost no hidden dimensions between transformers adjoined and zero throughput while randomizing trainable locations. It cuts VRAM and training time down and the models set higher benchmarks on mathematics than FFT models trained on the same corpus. I discovered this while fucking around for fun.
I don't doubt the discoveries to be made with training smaller architectures have many more surprises in store for us.