Reward Inside the Model: A Lightweight Hidden-State Reward Model for LLM's Best-of-N sampling
Paper • 2505.12225 • Published • 9
This repository contains the weights for a SWIFT (lightweight linear) reward head.
Paper page: https://huggingface.co/papers/2505.12225
GitHub: https://github.com/aster2024/SWIFT/
Base model: mistralai/Ministral-8B-Instruct-2410 (https://huggingface.co/mistralai/Ministral-8B-Instruct-2410)
This checkpoint corresponds to the Generalization Test setup (DeepScaleR → generalization to other math reasoning datasets).
You can load this model using torch.load and the LinearRewardModel architecture in this repository (utils.py).
import torch
import torch.nn as nn
class LinearRewardModel(nn.Module):
def __init__(self, feature_dim, disable_gate=False):
super(LinearRewardModel, self).__init__()
self.disable_gate = disable_gate
if not disable_gate:
self.fused_layer = nn.Linear(feature_dim, 2)
else:
self.reward_layer = nn.Linear(feature_dim, 1)
def forward(self, x):
# ... logic ...
pass
model = LinearRewardModel(feature_dim=151552, disable_gate=False)
sd = torch.load("swift_reward_model.pt")
model.load_state_dict(sd)