Unable to read the data I prepared

### Checks

- [x] This template is only for usage issues encountered.
- [x] I have thoroughly reviewed the project documentation but couldn't find information to solve my problem.
- [x] I have searched for existing issues, including closed ones, and couldn't find a solution.
- [x] I am using English to submit this issue to facilitate community communication.

### Environment Details

Manjaro Linux
Python 3.13.3
Torch 2.7.1+cu128
Gradio 5.35.0
GPU: RTX5070


### Steps to Reproduce

1. create a virtual environment
2. download latest release and install as pip package
3. download checkpoints and vocos from hugging face, and modify `infer_gradio.py` line 54 
```py
DEFAULT_TTS_MODEL_CFG = [
    "/home/ai/Documents/AI Voice Clone/F5-TTS/ckpts/F5TTS_v1_Base/model_1250000.safetensors",
    "/home/ai/Documents/AI Voice Clone/F5-TTS/ckpts/F5TTS_v1_Base/vocab.txt",
    # "hf://SWivid/F5-TTS/F5TTS_v1_Base/model_1250000.safetensors",
    # "hf://SWivid/F5-TTS/F5TTS_v1_Base/vocab.txt",
    json.dumps(dict(dim=1024, depth=22, heads=16, ff_mult=2, text_dim=512, conv_layers=4)),
]
```
and `utils_infer.py` line 104, to run it locally, try f5-tts_infer-gradio, **works fine.**
```py
def load_vocoder(vocoder_name="vocos", is_local=True, local_path="/home/ai/Documents/AI Voice Clone/F5-TTS/ckpts/vocos", device=device, hf_cache_dir=None):
    if vocoder_name == "vocos":
```
4. then follow [Gradio UI Training](https://github.com/SWivid/F5-TTS/discussions/143), try transcribe data first, upload audio files, wait a long time but all files transcribe failed, seems using whisper model to transcribe and my network unable to access directly, **failed.**
5. try use custom dataset with this [guide](https://github.com/SWivid/F5-TTS/discussions/57), prepared metadata.csv and wav audio files, then run `python scripts/prepare_csv_wavs.py` , generated json, arrorw, vocab.txt in `/home/ai/Documents/AI Voice Clone/F5-TTS/data/my_speech_pinyin` folder,  actually I can't found a place to modify `dataset_name `, then run `python train.py`, error is no `model_cfg` given. **failed.**
6. ok, then use prepared metadata.csv and wav audio files with gradio UI, just go to prepare data, and stuck too, `Error: No audio files found in the specified path : /home/ai/Documents/AI Voice Clone/F5-TTS/src/f5_tts/../../data/my_speech_pinyin/wavs`, I actually have wav audio files in path `/home/ai/Documents/AI Voice Clone/F5-TTS/data/my_speech_pinyin/wavs` and `/home/ai/Documents/AI Voice Clone/F5-TTS/src/f5_tts/data/my_speech_pinyin/wavs`, **failed**

I’m stuck here for 2 days, I checked issue, readme, multiple youtube videos, still no clue, please help.

### ✔️ Expected Behavior

I hope I can complete this training step and train my model.

### ❌ Actual Behavior

as mentioned above. orz 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to read the data I prepared #1131

Checks

Environment Details

Steps to Reproduce

✔️ Expected Behavior

❌ Actual Behavior

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Unable to read the data I prepared #1131

Description

Checks

Environment Details

Steps to Reproduce

✔️ Expected Behavior

❌ Actual Behavior

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions