training

This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
No log	0.55	400	0.6593	1.0074	0.5904	0.7357	0.4170	-140.8990	-176.0949	-35.9356	-33.1922
0.7974	1.11	800	0.5807	1.1511	0.5902	0.7634	0.5610	-140.9016	-174.6575	-35.9192	-33.2655
0.5983	1.66	1200	0.5200	1.0697	0.4300	0.7979	0.6397	-142.5030	-175.4720	-35.5696	-33.0300
0.4982	2.21	1600	0.4807	1.1128	0.3733	0.8158	0.7395	-143.0704	-175.0409	-35.2967	-32.7791
0.4663	2.77	2000	0.4649	1.1097	0.3323	0.8186	0.7774	-143.4800	-175.0714	-35.2043	-32.7114

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

(101)

this model