BananaMind/MiniBananaMind-v3-9M
Text Generation • 8.88M • Updated • 2
Its Llama architecture
so the model you are training is "lamma For Casual LLM" or your custom architecture ?
Yes
Yeah definitely not, excited to see how it does when its fully trained!
Would be great to add winogrande to the leaderboard!
right now it would be #12 on the leaderboard <100M
ok then not terrible for mine 75M then
Yes, please
Benchmarks have gotten better at 210K
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| arc_challenge | 1 | none | 0 | acc | ↑ | 0.2073 | ± | 0.0118 |
| none | 0 | acc_norm | ↑ | 0.2398 | ± | 0.0125 | ||
| arc_easy | 1 | none | 0 | acc | ↑ | 0.4928 | ± | 0.0103 |
| none | 0 | acc_norm | ↑ | 0.4259 | ± | 0.0101 | ||
| hellaswag | 1 | none | 0 | acc | ↑ | 0.2892 | ± | 0.0045 |
| none | 0 | acc_norm | ↑ | 0.3077 | ± | 0.0046 | ||
| piqa | 1 | none | 0 | acc | ↑ | 0.6219 | ± | 0.0113 |
| none | 0 | acc_norm | ↑ | 0.6028 | ± | 0.0114 | ||
| winogrande | 1 | none | 0 | acc | ↑ | 0.4949 | ± | 0.0141 |
| === ArithMark-2.0 checkpoint-210000 === | ||||||||
| Average: 0.2588 (647/2500) |
With a MIN_LR of 3e-5 and LR=3e-4
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| arc_challenge | 1 | none | 0 | acc | ↑ | 0.2073 | ± | 0.0118 |
| none | 0 | acc_norm | ↑ | 0.2381 | ± | 0.0124 | ||
| arc_easy | 1 | none | 0 | acc | ↑ | 0.4882 | ± | 0.0103 |
| none | 0 | acc_norm | ↑ | 0.4234 | ± | 0.0101 | ||
| hellaswag | 1 | none | 0 | acc | ↑ | 0.2880 | ± | 0.0045 |
| none | 0 | acc_norm | ↑ | 0.3038 | ± | 0.0046 | ||
| piqa | 1 | none | 0 | acc | ↑ | 0.6153 | ± | 0.0114 |
| none | 0 | acc_norm | ↑ | 0.5968 | ± | 0.0114 | ||
| winogrande | 1 | none | 0 | acc | ↑ | 0.5107 | ± | 0.0140 |
=== ArithMark-2.0 checkpoint-150000 ===
Average: 0.2540 (635/2500)
Random chance: 0.2500
By difficulty:
easy: 0.2528 (316/1250)
hard: 0.2320 (116/500)
medium: 0.2707 (203/750)
By operator_count:
1: 0.2528 (316/1250)
2: 0.2707 (203/750)
3: 0.2320 (116/500)
By topic:
addition: 0.1543 (83/538)
division: 0.5000 (65/130)
mixed_three_ops: 0.2479 (60/242)
mixed_two_ops: 0.2456 (97/395)
multiplication: 0.4375 (63/144)
parentheses_three_ops: 0.2171 (56/258)
parentheses_two_ops: 0.2986 (106/355)
subtraction: 0.2397 (105/438)
Im at 150K and the benchmarks arent that good, did this also happen to you with Supra 50M Base?