Finetuning tips

by khursani8 - opened Feb 16

Feb 16

Hi thanks for sharing this v2 audio tokenizer
if I want to finetune this tokenizer with my own country language audio
Should i do Phase 1 then Phase 2 or only Phase 2?

Also i should finetune decoder only right?

Aratako

Owner Feb 16

Hi!

If you are just fine-tuning the pre-trained model to adapt to a new language, using the Phase 2 configuration should be sufficient.
However, since your target language likely wasn't in the original training data, you might achieve even better quality by training the entire model (including the content branch).
Note: If you need to maintain token compatibility with the current encoder (e.g., to use with existing TTS models), you must freeze the content branch during training.

Also, just a heads-up: since this v2 model is 44.1kHz, the training configurations for Mel Spec Loss and GAN Loss (specifically parameters like FFT size) are set to match Aratako/MioCodec-25Hz-44.1kHz.

khursani8

Feb 17

Ok noted,

I will try Phase 2 first and see the result, also noted for the trn config for v2

Thanks 😁

khursani8 changed discussion status to closed Feb 17

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment