Skip to content

✊ Multi-Speaker Pytorch FastSpeech2 : Fast and High-Quality End-to-End Text to Speech

License

Notifications You must be signed in to change notification settings

RAYTRAC3R/FastSpeech2

 
 

Repository files navigation



Multi-Speaker FastSpeech 2 - PyTorch


FastSpeech 2 - PyTorch Implementation ⚡



Datasets 🐘

This project supports 4 datasets, including muti-speaker datasets and single-speaker datasets:

🔥 Multi-Speaker

  • LibriTTS

  • VCTK

🔥 Single-Speaker

  • LJSpeech

  • Blizzard2013

After downloading the dataset, extract the compressed files. You have to modify the hp.data_path and some other parameters in hparams.py. Default parameters are for the LibriTTS dataset.

Quick Start ✊

  1. Download the pretrained model.
  2. Put checkpoint_600000.pth.tar in ./states/ckpt.
  3. Run
python synthesize.py

Preprocessing ✏️

Preprocessing contains 3 stages:

  1. Preparing Alignment Data
  2. Montreal Force Alignmnet (MFA)
  3. Creating Training Dataset

For 2. Montreal Force Alignmnet (MFA), please refer to Montreal-Forced-Aligner.

Download and extract the tar.gz file, then specify the path to MFA in hparams.py

Then run:

python preprocess.py --prepare_align --mfa --create_dataset

After preprocessing, you will get a stat.txt file in your hp.preprocessed_path/, recording the maximum and minimum values of the fundamental frequency and energy values throughout the entire corpus. You have to modify the f0 and energy parameters in the data/dataset.yaml according to the content of stat.txt.

Training 🐍

Train your model with

python train.py

The training output, including log message, checkpoint, and synthesized audios will be put in ./states

References 📔

About

✊ Multi-Speaker Pytorch FastSpeech2 : Fast and High-Quality End-to-End Text to Speech

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%