Celo2-base
Official pretrained weights for the Celo2-base learned update rule: This variant uses the learned update rule for all parameters without any optimization harness. For better performance, see celo2 that uses Newton-Schulz orthogonalization and AdamW for biases/embeddings.
Quickstart
Download checkpoint and install:
pip install git+https://github.com/amoudgl/celo2.git
hf download amoudgl/celo2-base --local-dir ./celo2-base
Use load_checkpoint method to fetch pretrained params from checkpoint path:
from celo2_optax import load_checkpoint
pretrained_params = load_checkpoint('./celo2-base/theta.state')
Standard optax usage with scale_by_celo2 method that takes pretrained params as input:
import optax
from celo2_optax import scale_by_celo2
optimizer = optax.chain(
scale_by_celo2(pretrained_params, orthogonalize=False),
optax.add_decayed_weights(weight_decay),
optax.scale_by_learning_rate(lr_schedule),
)
Loading and inspecting MLP update rule weights
from celo2_optax import load_checkpoint
import jax
pretrained_params = load_checkpoint('./celo2-base/theta.state') # dictionary containing weights
print(jax.tree.map(lambda x: x.shape, pretrained_params))
The checkpoint contains a small MLP stored under the ff_mod_stack key with weight matrices (w0__*, w1, w2) and biases (b0, b1, b2). Each w0__* key contains weights corresponding to particular input feature such as momentum, gradient, parameter, etc.
Meta-training config
| Key | Value |
|---|---|
| Optimizer architecture | MLP, 2 hidden layers, 8 units each |
| Meta-training tasks | 4 image classification tasks (MNIST, FMNIST, CIFAR-10, SVHN) |
| Task architecture | MLP (64-32-10) |
| Meta-trainer | Persistent Evolution Strategies (PES) |
| Outer iterations | 100K |
| Truncation length | 50 |
| Min unroll length | 100 |
| Max unroll length | 2000 |
For more details, see config JSON included in the repo here.
Files
| File | Description |
|---|---|
theta.state |
Pretrained MLP optimizer weights |
config.json |
Meta-training configuration |
Citation
@misc{moudgil2026celo2,
title={Celo2: Towards Learned Optimization Free Lunch},
author={Abhinav Moudgil and Boris Knyazev and Eugene Belilovsky},
year={2026},
eprint={2602.19142},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2602.19142},
}
- Downloads last month
- 20