Celo2-base

Official pretrained weights for Celo2-base learned update rule: This variant uses the learned update rule for all parameters without any optimization harness. For better performance, see celo2 that uses Newton-Schulz orthogonalization and AdamW for biases/embeddings.

Quickstart

Download checkpoint and install:

pip install git+https://github.com/amoudgl/celo2.git
hf download amoudgl/celo2-base --local-dir ./celo2-base

Use load_checkpoint method to fetch pretrained params from checkpoint path:

from celo2_optax import load_checkpoint
pretrained_params = load_checkpoint('./celo2-base/theta.state')

Standard optax usage with scale_by_celo2 method that takes pretrained params as input:

import optax
from celo2_optax import scale_by_celo2

optimizer = optax.chain(
    scale_by_celo2(pretrained_params, orthogonalize=False),
    optax.add_decayed_weights(weight_decay),
    optax.scale_by_learning_rate(lr_schedule),
)

Loading and inspecting MLP update rule weights

from celo2_optax import load_checkpoint
import jax

pretrained_params = load_checkpoint('./celo2-base/theta.state')  # dictionary containing weights
print(jax.tree.map(lambda x: x.shape, pretrained_params))

The checkpoint contains a small MLP stored under the ff_mod_stack key with weight matrices (w0__*, w1, w2) and biases (b0, b1, b2). Each w0__* key contains weights corresponding to particular input feature such as momentum, gradient, parameter, etc.

Meta-training config

Key	Value
Optimizer architecture	MLP, 2 hidden layers, 8 units each
Meta-training tasks	4 image classification tasks (MNIST, FMNIST, CIFAR-10, SVHN)
Task architecture	MLP (64-32-10)
Meta-trainer	Persistent Evolution Strategies (PES)
Outer iterations	100K
Truncation length	50
Min unroll length	100
Max unroll length	2000

For more details, see config JSON included in the repo here.

Files

File	Description
`theta.state`	Pretrained MLP optimizer weights
`config.json`	Meta-training configuration

Citation

@misc{moudgil2026celo2,
      title={Celo2: Towards Learned Optimization Free Lunch},
      author={Abhinav Moudgil and Boris Knyazev and Eugene Belilovsky},
      year={2026},
      eprint={2602.19142},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.19142},
}

Downloads last month: 6

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for amoudgl/celo2-base

Celo2: Towards Learned Optimization Free Lunch

Paper • 2602.19142 • Published Feb 22