calculator_model_test

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.4887

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 512
eval_batch_size: 512
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss
3.7992	1.0	4	3.3759
3.2713	2.0	8	3.0747
3.0077	3.0	12	2.8470
2.8017	4.0	16	2.6776
2.6544	5.0	20	2.5598
2.5406	6.0	24	2.4680
2.4533	7.0	28	2.3877
2.3706	8.0	32	2.3072
2.2888	9.0	36	2.2281
2.2113	10.0	40	2.1635
2.1521	11.0	44	2.0968
2.0820	12.0	48	2.0234
2.0103	13.0	52	1.9459
1.9386	14.0	56	1.8604
1.8504	15.0	60	1.7755
1.7743	16.0	64	1.6988
1.7016	17.0	68	1.6309
1.6317	18.0	72	1.5733
1.5746	19.0	76	1.5232
1.5296	20.0	80	1.4786
1.4842	21.0	84	1.4378
1.4424	22.0	88	1.3982
1.4026	23.0	92	1.3584
1.3666	24.0	96	1.3200
1.3270	25.0	100	1.2837
1.2885	26.0	104	1.2460
1.2536	27.0	108	1.2117
1.2224	28.0	112	1.1795
1.1877	29.0	116	1.1505
1.1601	30.0	120	1.1205
1.1318	31.0	124	1.0939
1.1067	32.0	128	1.0691
1.0819	33.0	132	1.0422
1.0510	34.0	136	1.0184
1.0321	35.0	140	0.9936
1.0098	36.0	144	0.9706
0.9784	37.0	148	0.9458
0.9552	38.0	152	0.9205
0.9388	39.0	156	0.8972
0.9144	40.0	160	0.8710
0.8855	41.0	164	0.8509
0.8683	42.0	168	0.8277
0.8407	43.0	172	0.8134
0.8343	44.0	176	0.7972
0.8074	45.0	180	0.7759
0.7915	46.0	184	0.7621
0.7782	47.0	188	0.7481
0.7615	48.0	192	0.7329
0.7475	49.0	196	0.7170
0.7301	50.0	200	0.7028
0.7258	51.0	204	0.6922
0.7165	52.0	208	0.6794
0.7019	53.0	212	0.6723
0.6936	54.0	216	0.6623
0.6817	55.0	220	0.6507
0.6698	56.0	224	0.6422
0.6609	57.0	228	0.6360
0.6599	58.0	232	0.6297
0.6467	59.0	236	0.6223
0.6475	60.0	240	0.6133
0.6388	61.0	244	0.6070
0.6265	62.0	248	0.5988
0.6248	63.0	252	0.5900
0.6150	64.0	256	0.5890
0.6124	65.0	260	0.5820
0.6042	66.0	264	0.5772
0.6059	67.0	268	0.5699
0.5938	68.0	272	0.5640
0.5920	69.0	276	0.5596
0.5904	70.0	280	0.5550
0.5843	71.0	284	0.5520
0.5784	72.0	288	0.5443
0.5716	73.0	292	0.5434
0.5674	74.0	296	0.5381
0.5629	75.0	300	0.5369
0.5635	76.0	304	0.5301
0.5572	77.0	308	0.5274
0.5574	78.0	312	0.5248
0.5529	79.0	316	0.5207
0.5472	80.0	320	0.5171
0.5497	81.0	324	0.5148
0.5475	82.0	328	0.5108
0.5360	83.0	332	0.5090
0.5498	84.0	336	0.5069
0.5367	85.0	340	0.5039
0.5377	86.0	344	0.5025
0.5403	87.0	348	0.5025
0.5321	88.0	352	0.5008
0.5340	89.0	356	0.4972
0.5292	90.0	360	0.4955
0.5299	91.0	364	0.4958
0.5316	92.0	368	0.4942
0.5237	93.0	372	0.4927
0.5258	94.0	376	0.4924
0.5258	95.0	380	0.4916
0.5180	96.0	384	0.4909
0.5249	97.0	388	0.4901
0.5210	98.0	392	0.4895
0.5207	99.0	396	0.4889
0.5229	100.0	400	0.4887

Framework versions

Transformers 5.0.0
Pytorch 2.10.0+cu128
Datasets 4.0.0
Tokenizers 0.22.2

Downloads last month: 62

Safetensors

Model size

7.8M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support