GitHub Open In Colab

Zual/chess_char

Model Description

Zual/chess_char is a GPT-2 based model trained to generate chess games in PGN (Portable Game Notation) format. It treats chess moves as a language modeling task, learning to predict the next character in a PGN sequence.

Intended Use

This model is intended for research purposes to study the capabilities of Transformer models in learning structured, rule-based systems (like Chess) purely from observational data.

Primary Use Case: Generating valid PGN chess game continuations from a given prefix.

Usage

You can use this model directly with the Hugging Face transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Zual/chess_char"
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

# Note: The model uses a custom tokenizer which should be loaded via the repository scripts
# or by following the instructions in the GitHub repo.

For a complete inference example with the custom tokenizer, please refer to the GitHub repository.

Training Data

The model was trained on a dataset of standard chess games from Lichess (rated 2000+, September 2016 dump).

  • Source: Lichess Database
  • Filtering: Minimum 20 moves, no time-outs or abandonments.
  • Preprocessing: Games were converted to char-level tokens.

Training Procedure

Hyperparameters

The model was trained with the following configuration:

  • Architecture: GPT-2
  • Layers: 8
  • Heads: 8
  • Embedding Dim: 512
  • Context Size: 1024
  • Vocab Size: ~32 (Character-level PGN tokens)
  • Batch Size: 64
  • Learning Rate: 1e-3
  • Optimizer: AdamW
  • Epochs: 5
  • Mixed Precision: FP16

Evaluation

The model's performance is evaluated based on:

  1. Legal Move Rate: Percentage of generated moves that are legal according to chess rules.
  2. Move Quality: Comparison of move distributions against historical games and Stockfish evaluations (see paper).

Limitations

  • The model does not "know" the rules of chess explicitly; it only predicts the next character based on statistical patterns.
  • While it achieves a high rate of legal moves (~98%), it may occasionally generate illegal moves or invalid PGN syntax, especially in long sequences.
  • It is not a chess engine and does not optimize for winning, but for mimicking human play style found in the training data.
Downloads last month
7
Safetensors
Model size
25.8M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using Zual/chess_char 1