calculator_model_test

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4887

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 512
  • eval_batch_size: 512
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss
3.7992 1.0 4 3.3759
3.2713 2.0 8 3.0747
3.0077 3.0 12 2.8470
2.8017 4.0 16 2.6776
2.6544 5.0 20 2.5598
2.5406 6.0 24 2.4680
2.4533 7.0 28 2.3877
2.3706 8.0 32 2.3072
2.2888 9.0 36 2.2281
2.2113 10.0 40 2.1635
2.1521 11.0 44 2.0968
2.0820 12.0 48 2.0234
2.0103 13.0 52 1.9459
1.9386 14.0 56 1.8604
1.8504 15.0 60 1.7755
1.7743 16.0 64 1.6988
1.7016 17.0 68 1.6309
1.6317 18.0 72 1.5733
1.5746 19.0 76 1.5232
1.5296 20.0 80 1.4786
1.4842 21.0 84 1.4378
1.4424 22.0 88 1.3982
1.4026 23.0 92 1.3584
1.3666 24.0 96 1.3200
1.3270 25.0 100 1.2837
1.2885 26.0 104 1.2460
1.2536 27.0 108 1.2117
1.2224 28.0 112 1.1795
1.1877 29.0 116 1.1505
1.1601 30.0 120 1.1205
1.1318 31.0 124 1.0939
1.1067 32.0 128 1.0691
1.0819 33.0 132 1.0422
1.0510 34.0 136 1.0184
1.0321 35.0 140 0.9936
1.0098 36.0 144 0.9706
0.9784 37.0 148 0.9458
0.9552 38.0 152 0.9205
0.9388 39.0 156 0.8972
0.9144 40.0 160 0.8710
0.8855 41.0 164 0.8509
0.8683 42.0 168 0.8277
0.8407 43.0 172 0.8134
0.8343 44.0 176 0.7972
0.8074 45.0 180 0.7759
0.7915 46.0 184 0.7621
0.7782 47.0 188 0.7481
0.7615 48.0 192 0.7329
0.7475 49.0 196 0.7170
0.7301 50.0 200 0.7028
0.7258 51.0 204 0.6922
0.7165 52.0 208 0.6794
0.7019 53.0 212 0.6723
0.6936 54.0 216 0.6623
0.6817 55.0 220 0.6507
0.6698 56.0 224 0.6422
0.6609 57.0 228 0.6360
0.6599 58.0 232 0.6297
0.6467 59.0 236 0.6223
0.6475 60.0 240 0.6133
0.6388 61.0 244 0.6070
0.6265 62.0 248 0.5988
0.6248 63.0 252 0.5900
0.6150 64.0 256 0.5890
0.6124 65.0 260 0.5820
0.6042 66.0 264 0.5772
0.6059 67.0 268 0.5699
0.5938 68.0 272 0.5640
0.5920 69.0 276 0.5596
0.5904 70.0 280 0.5550
0.5843 71.0 284 0.5520
0.5784 72.0 288 0.5443
0.5716 73.0 292 0.5434
0.5674 74.0 296 0.5381
0.5629 75.0 300 0.5369
0.5635 76.0 304 0.5301
0.5572 77.0 308 0.5274
0.5574 78.0 312 0.5248
0.5529 79.0 316 0.5207
0.5472 80.0 320 0.5171
0.5497 81.0 324 0.5148
0.5475 82.0 328 0.5108
0.5360 83.0 332 0.5090
0.5498 84.0 336 0.5069
0.5367 85.0 340 0.5039
0.5377 86.0 344 0.5025
0.5403 87.0 348 0.5025
0.5321 88.0 352 0.5008
0.5340 89.0 356 0.4972
0.5292 90.0 360 0.4955
0.5299 91.0 364 0.4958
0.5316 92.0 368 0.4942
0.5237 93.0 372 0.4927
0.5258 94.0 376 0.4924
0.5258 95.0 380 0.4916
0.5180 96.0 384 0.4909
0.5249 97.0 388 0.4901
0.5210 98.0 392 0.4895
0.5207 99.0 396 0.4889
0.5229 100.0 400 0.4887

Framework versions

  • Transformers 5.0.0
  • Pytorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
62
Safetensors
Model size
7.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support