SWIFT Reward Model (Ministral-8B + DeepScaleR)

This repository contains the weights for a SWIFT (lightweight linear) reward head.

Paper page: https://huggingface.co/papers/2505.12225

GitHub: https://github.com/aster2024/SWIFT/

Base model: mistralai/Ministral-8B-Instruct-2410 (https://huggingface.co/mistralai/Ministral-8B-Instruct-2410)

Model Description

This checkpoint corresponds to the Generalization Test setup (DeepScaleR → generalization to other math reasoning datasets).

  • Base Model: mistralai/Ministral-8B-Instruct-2410
  • Training Data: Trained on 10000 samples from DeepScaleR (rollouts generated by the base model).
    • Note: While we release the 10k-sample version here, scaling up the training set size (e.g., generating more rollouts) can further improve performance.
  • Capabilities:
    • Designed to score math reasoning trajectories.
    • Trained on DeepScaleR, and can be used to test generalization on datasets such as MATH / GSM8K / AQuA-RAT.

Model Details

  • Architecture: A lightweight linear head (SWIFT) trained on top of the base model's frozen representations.
  • Feature Dimension: 151552
  • Gating: Enabled.

Usage

You can load this model using torch.load and the LinearRewardModel architecture in this repository (utils.py).

import torch
import torch.nn as nn

class LinearRewardModel(nn.Module):
    def __init__(self, feature_dim, disable_gate=False):
        super(LinearRewardModel, self).__init__()
        self.disable_gate = disable_gate
        if not disable_gate:
            self.fused_layer = nn.Linear(feature_dim, 2)
        else:
            self.reward_layer = nn.Linear(feature_dim, 1)

    def forward(self, x):
        # ... logic ...
        pass

model = LinearRewardModel(feature_dim=151552, disable_gate=False)
sd = torch.load("swift_reward_model.pt")
model.load_state_dict(sd)
Downloads last month
2
Video Preview
loading

Paper for Aster2024/swift-ministral-8b-deepscaler