Finetuning tips
Hi!
If you are just fine-tuning the pre-trained model to adapt to a new language, using the Phase 2 configuration should be sufficient.
However, since your target language likely wasn't in the original training data, you might achieve even better quality by training the entire model (including the content branch).
Note: If you need to maintain token compatibility with the current encoder (e.g., to use with existing TTS models), you must freeze the content branch during training.
Also, just a heads-up: since this v2 model is 44.1kHz, the training configurations for Mel Spec Loss and GAN Loss (specifically parameters like FFT size) are set to match Aratako/MioCodec-25Hz-44.1kHz.
Ok noted,
I will try Phase 2 first and see the result, also noted for the trn config for v2
Thanks π
