You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
📢 [English Voice Samples](https://erogol.github.io/ddc-samples/) and [SoundCloud playlist](https://soundcloud.com/user-565970875/pocket-article-wavernn-and-tacotron2)
18
19
19
-
👩🏽🍳 [TTS training recipes](https://github.com/erogol/TTS_recipes)
20
-
21
20
📄 [Text-to-Speech paper collection](https://github.com/erogol/TTS-papers)
22
21
23
22
## 💬 Where to ask questions
24
-
Please use our dedicated channels for questions and discussion. Help is much more valuable if it's shared publicly, so that more people can benefit from it.
23
+
Please use our dedicated channels for questions and discussion. Help is much more valuable if it's shared publicly so that more people can benefit from it.
pip install -e .[all,dev,notebooks,tf] # Select the relevant extras
124
120
```
125
121
126
-
We use ```espeak-ng``` to convert graphemes to phonemes. You might need to install separately.
127
-
128
-
```bash
129
-
sudo apt-get install espeak-ng
130
-
```
131
-
132
122
If you are on Ubuntu (Debian), you can also run following commands for installation.
133
123
134
124
```bash
@@ -137,6 +127,7 @@ $ make install
137
127
```
138
128
139
129
If you are on Windows, 👑@GuyPaddock wrote installation instructions [here](https://stackoverflow.com/questions/66726331/how-can-i-run-mozilla-tts-coqui-tts-training-with-cuda-on-a-windows-system).
130
+
140
131
## Directory Structure
141
132
```
142
133
|- notebooks/ (Jupyter Notebooks for model evaluation, parameter selection and data analysis.)
@@ -147,6 +138,7 @@ If you are on Windows, 👑@GuyPaddock wrote installation instructions [here](ht
147
138
|- distribute.py (train your TTS model using Multiple GPUs.)
148
139
|- compute_statistics.py (compute dataset statistics for normalization.)
149
140
|- convert*.py (convert target torch model to TF.)
141
+
|- ...
150
142
|- tts/ (text to speech models)
151
143
|- layers/ (model layer definitions)
152
144
|- models/ (model definitions)
@@ -156,167 +148,4 @@ If you are on Windows, 👑@GuyPaddock wrote installation instructions [here](ht
156
148
|- (same)
157
149
|- vocoder/ (Vocoder models.)
158
150
|- (same)
159
-
```
160
-
161
-
## Sample Model Output
162
-
Below you see Tacotron model state after 16K iterations with batch-size 32 with LJSpeech dataset.
163
-
164
-
> "Recent research at Harvard has shown meditating for as little as 8 weeks can actually increase the grey matter in the parts of the brain responsible for emotional regulation and learning."
## Example: Synthesizing Speech on Terminal Using the Released Models.
185
-
<imgsrc="images/tts_cli.gif"/>
186
-
187
-
After the installation, 🐸TTS provides a CLI interface for synthesizing speech using pre-trained models. You can either use your own model or the release models under 🐸TTS.
188
-
189
-
Listing released 🐸TTS models.
190
-
191
-
```bash
192
-
tts --list_models
193
-
```
194
-
195
-
Run a TTS model, from the release models list, with its default vocoder. (Simply copy and paste the full model names from the list as arguments for the command below.)
## Example: Training and Fine-tuning LJ-Speech Dataset
273
-
Here you can find a [CoLab](https://gist.github.com/erogol/97516ad65b44dbddb8cd694953187c5b) notebook for a hands-on example, training LJSpeech. Or you can manually follow the guideline below.
274
-
275
-
To start with, split ```metadata.csv``` into train and validation subsets respectively ```metadata_train.csv``` and ```metadata_val.csv```. Note that for text-to-speech, validation performance might be misleading since the loss value does not directly measure the voice quality to the human ear and it also does not measure the attention module performance. Therefore, running the model with new sentences and listening to the results is the best way to go.
276
-
277
-
```
278
-
shuf metadata.csv > metadata_shuf.csv
279
-
head -n 12000 metadata_shuf.csv > metadata_train.csv
280
-
tail -n 1100 metadata_shuf.csv > metadata_val.csv
281
-
```
282
-
283
-
To train a new model, you need to define your own ```config.json``` to define model details, trainin configuration and more (check the examples). Then call the corressponding train script.
284
-
285
-
For instance, in order to train a tacotron or tacotron2 model on LJSpeech dataset, follow these steps.
0 commit comments