ComVo: Complex-Valued Neural Vocoder

Model description

ComVo is a complex-valued neural vocoder for waveform generation based on iSTFT.
Unlike conventional real-valued vocoders that process real and imaginary parts separately, ComVo operates directly in the complex domain using native complex arithmetic.

This enables:

  • Structured modeling of complex spectrograms
  • Adversarial training in the complex domain
  • Improved waveform synthesis quality

The model also introduces:

  • Phase quantization for structured phase modeling
  • Block-matrix computation for improved training efficiency

Paper

Toward Complex-Valued Neural Networks for Waveform Generation
Hyung-Seok Oh, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee
ICLR 2026

https://openreview.net/forum?id=U4GXPqm3Va

Intended use

This model is designed for:

  • Neural vocoding
  • Speech synthesis pipelines (e.g., TTS)
  • Audio waveform reconstruction from spectral features

Input

  • Raw waveform ([1, T]) or extracted features

Output

  • Generated waveform at 24kHz

Usage

Load model

from hf_model import ComVoHF

model = ComVoHF.from_pretrained("hsoh/ComVo-base")
model.eval()

Inference from waveform

audio = model.from_waveform(wav)

Inference from features

features = model.build_feature_extractor()(wav)
audio = model(features)

Model details

Model Parameters Sampling rate
Base 13.28M 24 kHz
Large 114.56M 24 kHz

Evaluation

Model UTMOS โ†‘ PESQ (wb) โ†‘ PESQ (nb) โ†‘ MRSTFT โ†“
Base 3.6744 3.8219 4.0727 0.8580
Large 3.7618 3.9993 4.1639 0.8227

Resources

Paper: https://openreview.net/forum?id=U4GXPqm3Va

Demo: https://hs-oh-prml.github.io/ComVo/

Code: https://github.com/hs-oh-prml/ComVo

Citation

@inproceedings{
  oh2026toward,
  title={Toward Complex-Valued Neural Networks for Waveform Generation},
  author={Hyung-Seok Oh and Deok-Hyeon Cho and Seung-Bin Kim and Seong-Whan Lee},
  booktitle={ICLR},
  year={2026}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including hsoh/ComVo-large