Closed
Description
Checks
- This template is only for usage issues encountered.
- I have thoroughly reviewed the project documentation but couldn't find information to solve my problem.
- I have searched for existing issues, including closed ones, and couldn't find a solution.
- I am using English to submit this issue to facilitate community communication.
Environment Details
Manjaro Linux
Python 3.13.3
Torch 2.7.1+cu128
Gradio 5.35.0
GPU: RTX5070
Steps to Reproduce
- create a virtual environment
- download latest release and install as pip package
- download checkpoints and vocos from hugging face, and modify
infer_gradio.py
line 54
DEFAULT_TTS_MODEL_CFG = [
"/home/ai/Documents/AI Voice Clone/F5-TTS/ckpts/F5TTS_v1_Base/model_1250000.safetensors",
"/home/ai/Documents/AI Voice Clone/F5-TTS/ckpts/F5TTS_v1_Base/vocab.txt",
# "hf://SWivid/F5-TTS/F5TTS_v1_Base/model_1250000.safetensors",
# "hf://SWivid/F5-TTS/F5TTS_v1_Base/vocab.txt",
json.dumps(dict(dim=1024, depth=22, heads=16, ff_mult=2, text_dim=512, conv_layers=4)),
]
and utils_infer.py
line 104, to run it locally, try f5-tts_infer-gradio, works fine.
def load_vocoder(vocoder_name="vocos", is_local=True, local_path="/home/ai/Documents/AI Voice Clone/F5-TTS/ckpts/vocos", device=device, hf_cache_dir=None):
if vocoder_name == "vocos":
- then follow Gradio UI Training, try transcribe data first, upload audio files, wait a long time but all files transcribe failed, seems using whisper model to transcribe and my network unable to access directly, failed.
- try use custom dataset with this guide, prepared metadata.csv and wav audio files, then run
python scripts/prepare_csv_wavs.py
, generated json, arrorw, vocab.txt in/home/ai/Documents/AI Voice Clone/F5-TTS/data/my_speech_pinyin
folder, actually I can't found a place to modifydataset_name
, then runpython train.py
, error is nomodel_cfg
given. failed. - ok, then use prepared metadata.csv and wav audio files with gradio UI, just go to prepare data, and stuck too,
Error: No audio files found in the specified path : /home/ai/Documents/AI Voice Clone/F5-TTS/src/f5_tts/../../data/my_speech_pinyin/wavs
, I actually have wav audio files in path/home/ai/Documents/AI Voice Clone/F5-TTS/data/my_speech_pinyin/wavs
and/home/ai/Documents/AI Voice Clone/F5-TTS/src/f5_tts/data/my_speech_pinyin/wavs
, failed
I’m stuck here for 2 days, I checked issue, readme, multiple youtube videos, still no clue, please help.
✔️ Expected Behavior
I hope I can complete this training step and train my model.
❌ Actual Behavior
as mentioned above. orz