Zual/chess_char

Model Description

Zual/chess_char is a GPT-2 based model trained to generate chess games in PGN (Portable Game Notation) format. It treats chess moves as a language modeling task, learning to predict the next character in a PGN sequence.

Intended Use

This model is intended for research purposes to study the capabilities of Transformer models in learning structured, rule-based systems (like Chess) purely from observational data.

Primary Use Case: Generating valid PGN chess game continuations from a given prefix.

Usage

You can use this model directly with the Hugging Face transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Zual/chess_char"
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

# Note: The model uses a custom tokenizer which should be loaded via the repository scripts
# or by following the instructions in the GitHub repo.

For a complete inference example with the custom tokenizer, please refer to the GitHub repository.

Training Data

The model was trained on a dataset of standard chess games from Lichess (rated 2000+, September 2016 dump).

Source: Lichess Database
Filtering: Minimum 20 moves, no time-outs or abandonments.
Preprocessing: Games were converted to char-level tokens.

Training Procedure

Hyperparameters

The model was trained with the following configuration:

Architecture: GPT-2
Layers: 8
Heads: 8
Embedding Dim: 512
Context Size: 1024
Vocab Size: ~32 (Character-level PGN tokens)
Batch Size: 64
Learning Rate: 1e-3
Optimizer: AdamW
Epochs: 5
Mixed Precision: FP16

Evaluation

The model's performance is evaluated based on:

Legal Move Rate: Percentage of generated moves that are legal according to chess rules.
Move Quality: Comparison of move distributions against historical games and Stockfish evaluations (see paper).

Limitations

The model does not "know" the rules of chess explicitly; it only predicts the next character based on statistical patterns.
While it achieves a high rate of legal moves (~98%), it may occasionally generate illegal moves or invalid PGN syntax, especially in long sequences.
It is not a chess engine and does not optimize for winning, but for mimicking human play style found in the training data.

Downloads last month: 7

Safetensors

Model size

25.8M params

Tensor type

F32

Zual
/

chess_char