mazesmazes commited on
Commit
5a38762
·
verified ·
1 Parent(s): 2342eac

Model save

Browse files
Files changed (1) hide show
  1. README.md +70 -70
README.md CHANGED
@@ -14,7 +14,7 @@ should probably proofread and complete it, then remove this comment. -->
14
 
15
  This model is a fine-tuned version of [](https://huggingface.co/) on the None dataset.
16
  It achieves the following results on the evaluation set:
17
- - Loss: 0.9147
18
 
19
  ## Model description
20
 
@@ -42,83 +42,83 @@ The following hyperparameters were used during training:
42
  - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
43
  - lr_scheduler_type: cosine
44
  - lr_scheduler_warmup_steps: 2000
45
- - num_epochs: 2
46
- - label_smoothing_factor: 0.1
47
 
48
  ### Training results
49
 
50
  | Training Loss | Epoch | Step | Validation Loss |
51
  |:-------------:|:------:|:-----:|:---------------:|
52
- | 2.4254 | 0.0309 | 1000 | 1.0182 |
53
- | 2.2814 | 0.0617 | 2000 | 0.9839 |
54
- | 2.2150 | 0.0926 | 3000 | 0.9564 |
55
- | 2.1892 | 0.1234 | 4000 | 0.9486 |
56
- | 2.1949 | 0.1543 | 5000 | 0.9442 |
57
- | 2.1758 | 0.1851 | 6000 | 0.9387 |
58
- | 2.1483 | 0.2160 | 7000 | 0.9367 |
59
- | 2.1435 | 0.2469 | 8000 | 0.9336 |
60
- | 2.1568 | 0.2777 | 9000 | 0.9336 |
61
- | 2.1350 | 0.3086 | 10000 | 0.9298 |
62
- | 2.1592 | 0.3394 | 11000 | 0.9312 |
63
- | 2.1291 | 0.3703 | 12000 | 0.9279 |
64
- | 2.1377 | 0.4011 | 13000 | 0.9271 |
65
- | 2.1456 | 0.4320 | 14000 | 0.9278 |
66
- | 2.1421 | 0.4628 | 15000 | 0.9254 |
67
- | 2.1457 | 0.4937 | 16000 | 0.9239 |
68
- | 2.1270 | 0.5246 | 17000 | 0.9247 |
69
- | 2.1311 | 0.5554 | 18000 | 0.9232 |
70
- | 2.1075 | 0.5863 | 19000 | 0.9220 |
71
- | 2.1268 | 0.6171 | 20000 | 0.9221 |
72
- | 2.1180 | 0.6480 | 21000 | 0.9210 |
73
- | 2.0914 | 0.6788 | 22000 | 0.9211 |
74
- | 2.1120 | 0.7097 | 23000 | 0.9214 |
75
- | 2.1319 | 0.7406 | 24000 | 0.9206 |
76
- | 2.1328 | 0.7714 | 25000 | 0.9203 |
77
- | 2.1336 | 0.8023 | 26000 | 0.9192 |
78
- | 2.0920 | 0.8331 | 27000 | 0.9193 |
79
- | 2.0895 | 0.8640 | 28000 | 0.9191 |
80
- | 2.1330 | 0.8948 | 29000 | 0.9184 |
81
- | 2.1262 | 0.9257 | 30000 | 0.9179 |
82
- | 2.0950 | 0.9566 | 31000 | 0.9177 |
83
- | 2.1082 | 0.9874 | 32000 | 0.9177 |
84
- | 2.0877 | 1.0183 | 33000 | 0.9175 |
85
- | 2.1202 | 1.0491 | 34000 | 0.9170 |
86
- | 2.1147 | 1.0800 | 35000 | 0.9168 |
87
- | 2.0989 | 1.1108 | 36000 | 0.9165 |
88
- | 2.0941 | 1.1417 | 37000 | 0.9162 |
89
- | 2.1437 | 1.1725 | 38000 | 0.9163 |
90
- | 2.0914 | 1.2034 | 39000 | 0.9160 |
91
- | 2.0870 | 1.2343 | 40000 | 0.9160 |
92
- | 2.0900 | 1.2651 | 41000 | 0.9159 |
93
- | 2.1074 | 1.2960 | 42000 | 0.9158 |
94
- | 2.0863 | 1.3268 | 43000 | 0.9156 |
95
- | 2.0879 | 1.3577 | 44000 | 0.9155 |
96
- | 2.0966 | 1.3885 | 45000 | 0.9151 |
97
- | 2.0793 | 1.4194 | 46000 | 0.9151 |
98
- | 2.0587 | 1.4503 | 47000 | 0.9148 |
99
- | 2.0919 | 1.4811 | 48000 | 0.9148 |
100
- | 2.0917 | 1.5120 | 49000 | 0.9149 |
101
- | 2.0948 | 1.5428 | 50000 | 0.9148 |
102
- | 2.1051 | 1.5737 | 51000 | 0.9148 |
103
- | 2.1150 | 1.6045 | 52000 | 0.9148 |
104
- | 2.0989 | 1.6354 | 53000 | 0.9149 |
105
- | 2.0856 | 1.6663 | 54000 | 0.9147 |
106
- | 2.0850 | 1.6971 | 55000 | 0.9148 |
107
- | 2.0982 | 1.7280 | 56000 | 0.9147 |
108
- | 2.1025 | 1.7588 | 57000 | 0.9147 |
109
- | 2.0903 | 1.7897 | 58000 | 0.9148 |
110
- | 2.0694 | 1.8205 | 59000 | 0.9147 |
111
- | 2.1191 | 1.8514 | 60000 | 0.9148 |
112
- | 2.0871 | 1.8823 | 61000 | 0.9147 |
113
- | 2.0957 | 1.9131 | 62000 | 0.9147 |
114
- | 2.0817 | 1.9440 | 63000 | 0.9147 |
115
- | 2.1124 | 1.9748 | 64000 | 0.9147 |
116
- | 2.0836 | 2.0 | 64816 | 0.9147 |
 
117
 
118
 
119
  ### Framework versions
120
 
121
- - Transformers 5.6.1
122
- - Pytorch 2.11.0+cu130
123
  - Datasets 3.6.0
124
  - Tokenizers 0.22.2
 
14
 
15
  This model is a fine-tuned version of [](https://huggingface.co/) on the None dataset.
16
  It achieves the following results on the evaluation set:
17
+ - Loss: 0.2044
18
 
19
  ## Model description
20
 
 
42
  - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
43
  - lr_scheduler_type: cosine
44
  - lr_scheduler_warmup_steps: 2000
45
+ - num_epochs: 1
 
46
 
47
  ### Training results
48
 
49
  | Training Loss | Epoch | Step | Validation Loss |
50
  |:-------------:|:------:|:-----:|:---------------:|
51
+ | 0.9934 | 0.0153 | 1000 | 0.3840 |
52
+ | 0.9974 | 0.0306 | 2000 | 0.4156 |
53
+ | 1.0350 | 0.0459 | 3000 | 0.3944 |
54
+ | 0.9922 | 0.0612 | 4000 | 0.3625 |
55
+ | 1.0129 | 0.0765 | 5000 | 0.3386 |
56
+ | 0.8650 | 0.0918 | 6000 | 0.3348 |
57
+ | 0.9696 | 0.1071 | 7000 | 0.3241 |
58
+ | 0.9879 | 0.1224 | 8000 | 0.3174 |
59
+ | 0.9225 | 0.1377 | 9000 | 0.3154 |
60
+ | 0.8560 | 0.1530 | 10000 | 0.3139 |
61
+ | 0.8554 | 0.1683 | 11000 | 0.3062 |
62
+ | 0.9126 | 0.1836 | 12000 | 0.3000 |
63
+ | 0.9142 | 0.1989 | 13000 | 0.2994 |
64
+ | 0.8358 | 0.2142 | 14000 | 0.2943 |
65
+ | 0.8452 | 0.2295 | 15000 | 0.2916 |
66
+ | 0.8372 | 0.2449 | 16000 | 0.2822 |
67
+ | 0.8776 | 0.2602 | 17000 | 0.2783 |
68
+ | 0.8697 | 0.2755 | 18000 | 0.2809 |
69
+ | 0.8541 | 0.2908 | 19000 | 0.2765 |
70
+ | 0.8511 | 0.3061 | 20000 | 0.2728 |
71
+ | 0.8440 | 0.3214 | 21000 | 0.2739 |
72
+ | 0.7897 | 0.3367 | 22000 | 0.2648 |
73
+ | 0.8196 | 0.3520 | 23000 | 0.2608 |
74
+ | 0.8320 | 0.3673 | 24000 | 0.2614 |
75
+ | 0.8043 | 0.3826 | 25000 | 0.2636 |
76
+ | 0.7875 | 0.3979 | 26000 | 0.2551 |
77
+ | 0.8257 | 0.4132 | 27000 | 0.2501 |
78
+ | 0.7276 | 0.4285 | 28000 | 0.2519 |
79
+ | 0.8196 | 0.4438 | 29000 | 0.2482 |
80
+ | 0.7727 | 0.4591 | 30000 | 0.2497 |
81
+ | 0.8316 | 0.4744 | 31000 | 0.2467 |
82
+ | 0.7738 | 0.4897 | 32000 | 0.2404 |
83
+ | 0.8146 | 0.5050 | 33000 | 0.2410 |
84
+ | 0.7571 | 0.5203 | 34000 | 0.2370 |
85
+ | 0.7921 | 0.5356 | 35000 | 0.2344 |
86
+ | 0.7792 | 0.5509 | 36000 | 0.2319 |
87
+ | 0.7014 | 0.5662 | 37000 | 0.2322 |
88
+ | 0.7425 | 0.5815 | 38000 | 0.2281 |
89
+ | 0.7644 | 0.5968 | 39000 | 0.2265 |
90
+ | 0.7048 | 0.6121 | 40000 | 0.2251 |
91
+ | 0.6970 | 0.6274 | 41000 | 0.2229 |
92
+ | 0.7856 | 0.6427 | 42000 | 0.2214 |
93
+ | 0.7114 | 0.6580 | 43000 | 0.2194 |
94
+ | 0.7751 | 0.6733 | 44000 | 0.2183 |
95
+ | 0.6482 | 0.6886 | 45000 | 0.2169 |
96
+ | 0.6889 | 0.7040 | 46000 | 0.2154 |
97
+ | 0.7554 | 0.7193 | 47000 | 0.2147 |
98
+ | 0.7050 | 0.7346 | 48000 | 0.2124 |
99
+ | 0.7927 | 0.7499 | 49000 | 0.2118 |
100
+ | 0.7309 | 0.7652 | 50000 | 0.2108 |
101
+ | 0.7264 | 0.7805 | 51000 | 0.2108 |
102
+ | 0.7256 | 0.7958 | 52000 | 0.2087 |
103
+ | 0.7605 | 0.8111 | 53000 | 0.2078 |
104
+ | 0.7391 | 0.8264 | 54000 | 0.2082 |
105
+ | 0.6781 | 0.8417 | 55000 | 0.2065 |
106
+ | 0.7206 | 0.8570 | 56000 | 0.2060 |
107
+ | 0.7342 | 0.8723 | 57000 | 0.2051 |
108
+ | 0.7519 | 0.8876 | 58000 | 0.2055 |
109
+ | 0.7258 | 0.9029 | 59000 | 0.2051 |
110
+ | 0.7932 | 0.9182 | 60000 | 0.2047 |
111
+ | 0.7391 | 0.9335 | 61000 | 0.2047 |
112
+ | 0.7416 | 0.9488 | 62000 | 0.2046 |
113
+ | 0.7249 | 0.9641 | 63000 | 0.2045 |
114
+ | 0.7000 | 0.9794 | 64000 | 0.2044 |
115
+ | 0.6958 | 0.9947 | 65000 | 0.2044 |
116
+ | 0.6692 | 1.0 | 65346 | 0.2044 |
117
 
118
 
119
  ### Framework versions
120
 
121
+ - Transformers 5.7.0
122
+ - Pytorch 2.8.0+cu128
123
  - Datasets 3.6.0
124
  - Tokenizers 0.22.2