🤗 Currently Training

Banaxi Inc. PRO

Banaxi-Tech

·

Banaxi-Tech

AI & ML interests

SLMs, training from scratch, LoRA, TTS, Ternary models

Recent Activity

updated a model about 1 hour ago

BananaMind/BananaMind-KV1-8M-2Bit-Experimental

updated a model about 1 hour ago

BananaMind/BananaMind-KV1-8M-1Bit-Experimental

updated a model about 1 hour ago

BananaMind/BananaMind-KV1-8M-16Bit-Experimental

View all activity

Organizations

replied to their post about 4 hours ago

Or just send them here

replied to their post about 4 hours ago

No its fine you can already put on leaderboard

replied to their post about 8 hours ago

Have Accepted

replied to their post about 8 hours ago

Awesome, can't wait to see it (and add it to the leaderboard)!

My evaluations suggest the model not being that great, would like you to bench https://huggingface.co/Banaxi-Tech/skarlet-alpha final_model with your setup before i release it. I gated the model so you need do and i accept so not everyone before release.

replied to their post about 17 hours ago

Its Llama architecture

replied to their post about 18 hours ago

so the model you are training is "lamma For Casual LLM" or your custom architecture ?

Yes

replied to their post about 18 hours ago

Yeah definitely not, excited to see how it does when its fully trained!

Would be great to add winogrande to the leaderboard!

replied to their post about 18 hours ago

right now it would be #12 on the leaderboard <100M

replied to their post about 18 hours ago

ok then not terrible for mine 75M then

replied to their post about 18 hours ago

Yes, please

replied to their post about 18 hours ago

Benchmarks have gotten better at 210K

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
arc_challenge	1	none	0	acc	↑	0.2073	±	0.0118
		none	0	acc_norm	↑	0.2398	±	0.0125
arc_easy	1	none	0	acc	↑	0.4928	±	0.0103
		none	0	acc_norm	↑	0.4259	±	0.0101
hellaswag	1	none	0	acc	↑	0.2892	±	0.0045
		none	0	acc_norm	↑	0.3077	±	0.0046
piqa	1	none	0	acc	↑	0.6219	±	0.0113
		none	0	acc_norm	↑	0.6028	±	0.0114
winogrande	1	none	0	acc	↑	0.4949	±	0.0141
=== ArithMark-2.0 checkpoint-210000 ===
Average: 0.2588 (647/2500)

replied to their post about 18 hours ago

With a MIN_LR of 3e-5 and LR=3e-4

posted an update 1 day ago

Post

104

Give us a follow at BananaMind. We are currently training two new models, so do not miss them when they release.

Here is what we are training:

BananaMind 1.5 Base

Our flagship 75M parameter model.

4096 token context window.
MiniBananaMind V3

Our 9M parameter model, targeting SOTA results for its size.

These will be released in the next few days.

replied to their post 1 day ago

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
arc_challenge	1	none	0	acc	↑	0.2073	±	0.0118
		none	0	acc_norm	↑	0.2381	±	0.0124
arc_easy	1	none	0	acc	↑	0.4882	±	0.0103
		none	0	acc_norm	↑	0.4234	±	0.0101
hellaswag	1	none	0	acc	↑	0.2880	±	0.0045
		none	0	acc_norm	↑	0.3038	±	0.0046
piqa	1	none	0	acc	↑	0.6153	±	0.0114
		none	0	acc_norm	↑	0.5968	±	0.0114
winogrande	1	none	0	acc	↑	0.5107	±	0.0140

=== ArithMark-2.0 checkpoint-150000 ===
Average: 0.2540  (635/2500)
Random chance: 0.2500

By difficulty:
easy: 0.2528  (316/1250)
hard: 0.2320  (116/500)
medium: 0.2707  (203/750)

By operator_count:
1: 0.2528  (316/1250)
2: 0.2707  (203/750)
3: 0.2320  (116/500)

By topic:
addition: 0.1543  (83/538)
division: 0.5000  (65/130)
mixed_three_ops: 0.2479  (60/242)
mixed_two_ops: 0.2456  (97/395)
multiplication: 0.4375  (63/144)
parentheses_three_ops: 0.2171  (56/258)
parentheses_two_ops: 0.2986  (106/355)
subtraction: 0.2397  (105/438)

replied to their post 1 day ago

Im at 150K and the benchmarks arent that good, did this also happen to you with Supra 50M Base?

posted an update 2 days ago

Post

3691

Hello AI Community! 👋
We currently have a new AI Model and we are currently training it.
We are training it on 27B tokens and are currently 8% done.
Follow us to be notified when it releases 🚀
Some Info:
Parameters 75M
GPU: RTX Pro 6000
We expect to be able to release it in the coming days

EDIT: We are now at step 210K

28 replies

·

posted an update 3 days ago

Post

121

We are excited to release MiniBananaMind-v2-9M

MiniBananaMind-v2-9M is a tiny causal language model trained fully from scratch on FineWeb-Edu.

It has only ~8.9M parameters, but was trained for ~3.54B tokens after retokenization using a custom 8k byte-level BPE tokenizer.
We trained it on a RTX 5070 Ti in just 4h 34m!
Go check it out at BananaMind/MiniBananaMind-v2-9M