Or just send them here
Banaxi Inc. PRO
AI & ML interests
Recent Activity
Organizations
No its fine you can already put on leaderboard
Have Accepted
Awesome, can't wait to see it (and add it to the leaderboard)!
My evaluations suggest the model not being that great, would like you to bench https://huggingface.co/Banaxi-Tech/skarlet-alpha final_model with your setup before i release it. I gated the model so you need do and i accept so not everyone before release.
Its Llama architecture
so the model you are training is "lamma For Casual LLM" or your custom architecture ?
Yes
Yeah definitely not, excited to see how it does when its fully trained!
Would be great to add winogrande to the leaderboard!
right now it would be #12 on the leaderboard <100M
ok then not terrible for mine 75M then
Yes, please
Benchmarks have gotten better at 210K
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| arc_challenge | 1 | none | 0 | acc | โ | 0.2073 | ยฑ | 0.0118 |
| none | 0 | acc_norm | โ | 0.2398 | ยฑ | 0.0125 | ||
| arc_easy | 1 | none | 0 | acc | โ | 0.4928 | ยฑ | 0.0103 |
| none | 0 | acc_norm | โ | 0.4259 | ยฑ | 0.0101 | ||
| hellaswag | 1 | none | 0 | acc | โ | 0.2892 | ยฑ | 0.0045 |
| none | 0 | acc_norm | โ | 0.3077 | ยฑ | 0.0046 | ||
| piqa | 1 | none | 0 | acc | โ | 0.6219 | ยฑ | 0.0113 |
| none | 0 | acc_norm | โ | 0.6028 | ยฑ | 0.0114 | ||
| winogrande | 1 | none | 0 | acc | โ | 0.4949 | ยฑ | 0.0141 |
| === ArithMark-2.0 checkpoint-210000 === | ||||||||
| Average: 0.2588 (647/2500) |
With a MIN_LR of 3e-5 and LR=3e-4
Here is what we are training:
BananaMind 1.5 Base
Our flagship 75M parameter model.
4096 token context window.
MiniBananaMind V3
Our 9M parameter model, targeting SOTA results for its size.
These will be released in the next few days.
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| arc_challenge | 1 | none | 0 | acc | โ | 0.2073 | ยฑ | 0.0118 |
| none | 0 | acc_norm | โ | 0.2381 | ยฑ | 0.0124 | ||
| arc_easy | 1 | none | 0 | acc | โ | 0.4882 | ยฑ | 0.0103 |
| none | 0 | acc_norm | โ | 0.4234 | ยฑ | 0.0101 | ||
| hellaswag | 1 | none | 0 | acc | โ | 0.2880 | ยฑ | 0.0045 |
| none | 0 | acc_norm | โ | 0.3038 | ยฑ | 0.0046 | ||
| piqa | 1 | none | 0 | acc | โ | 0.6153 | ยฑ | 0.0114 |
| none | 0 | acc_norm | โ | 0.5968 | ยฑ | 0.0114 | ||
| winogrande | 1 | none | 0 | acc | โ | 0.5107 | ยฑ | 0.0140 |
=== ArithMark-2.0 checkpoint-150000 ===
Average: 0.2540 (635/2500)
Random chance: 0.2500
By difficulty:
easy: 0.2528 (316/1250)
hard: 0.2320 (116/500)
medium: 0.2707 (203/750)
By operator_count:
1: 0.2528 (316/1250)
2: 0.2707 (203/750)
3: 0.2320 (116/500)
By topic:
addition: 0.1543 (83/538)
division: 0.5000 (65/130)
mixed_three_ops: 0.2479 (60/242)
mixed_two_ops: 0.2456 (97/395)
multiplication: 0.4375 (63/144)
parentheses_three_ops: 0.2171 (56/258)
parentheses_two_ops: 0.2986 (106/355)
subtraction: 0.2397 (105/438)
Im at 150K and the benchmarks arent that good, did this also happen to you with Supra 50M Base?
We currently have a new AI Model and we are currently training it.
We are training it on 27B tokens and are currently 8% done.
Follow us to be notified when it releases ๐
Some Info:
Parameters 75M
GPU: RTX Pro 6000
We expect to be able to release it in the coming days
EDIT: We are now at step 210K
MiniBananaMind-v2-9M is a tiny causal language model trained fully from scratch on FineWeb-Edu.
It has only ~8.9M parameters, but was trained for ~3.54B tokens after retokenization using a custom 8k byte-level BPE tokenizer.
We trained it on a RTX 5070 Ti in just 4h 34m!
Go check it out at BananaMind/MiniBananaMind-v2-9M