Description
Checks
- This template is only for bug reports, usage problems go with 'Help Wanted'.
- I have thoroughly reviewed the project documentation but couldn't find information to solve my problem.
- I have searched for existing issues, including closed ones, and couldn't find a solution.
- I am using English to submit this issue to facilitate community communication.
Environment Details
ubuntu 22.04 Tesla v100

Steps to Reproduce
(f5-tts) datascience@dell-PowerEdge-T630:~/item/jack/ai_tts/E2-F5-TTS$ accelerate launch src/f5_tts/train/train.py
ipex flag is deprecated, will be removed in Accelerate v1.10. From 2.7.0, PyTorch has all needed optimizations for Intel CPU and XPU.
The following values were not passed to accelerate launch
and had defaults used instead:
--num_processes
was set to a value of 3
More than one GPU was found, enabling multi-GPU training.
If this was unintended please pass in --num_processes=1
.
--num_machines
was set to a value of 1
--mixed_precision
was set to a value of 'no'
--dynamo_backend
was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config
.
Error executing job with overrides: []
Traceback (most recent call last):
File "/data1/home/datascience/item/jack/ai_tts/E2-F5-TTS/src/f5_tts/train/train.py", line 19, in main
model_cls = hydra.utils.get_class(f"f5_tts.model.{model_cfg.model.backbone}")
omegaconf.errors.ConfigAttributeError: Key 'model' is not in struct
full_key: model
object_type=dict
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Error executing job with overrides: []
Traceback (most recent call last):
File "/data1/home/datascience/item/jack/ai_tts/E2-F5-TTS/src/f5_tts/train/train.py", line 19, in main
model_cls = hydra.utils.get_class(f"f5_tts.model.{model_cfg.model.backbone}")
omegaconf.errors.ConfigAttributeError: Key 'model' is not in struct
full_key: model
object_type=dict
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Error executing job with overrides: []
Traceback (most recent call last):
File "/data1/home/datascience/item/jack/ai_tts/E2-F5-TTS/src/f5_tts/train/train.py", line 19, in main
model_cls = hydra.utils.get_class(f"f5_tts.model.{model_cfg.model.backbone}")
omegaconf.errors.ConfigAttributeError: Key 'model' is not in struct
full_key: model
object_type=dict
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
W0717 08:41:11.474000 135226227082304 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 1148465 closing signal SIGTERM
E0717 08:41:11.538000 135226227082304 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 1148463) of binary: /data1/home/datascience/anaconda3/envs/f5-tts/bin/python3.10
Traceback (most recent call last):
File "/data1/home/datascience/anaconda3/envs/f5-tts/bin/accelerate", line 8, in
sys.exit(main())
File "/data1/home/datascience/anaconda3/envs/f5-tts/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 50, in main
args.func(args)
File "/data1/home/datascience/anaconda3/envs/f5-tts/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1190, in launch_command
multi_gpu_launcher(args)
File "/data1/home/datascience/anaconda3/envs/f5-tts/lib/python3.10/site-packages/accelerate/commands/launch.py", line 815, in multi_gpu_launcher
distrib_run.run(args)
File "/data1/home/datascience/anaconda3/envs/f5-tts/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run
elastic_launch(
File "/data1/home/datascience/anaconda3/envs/f5-tts/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/data1/home/datascience/anaconda3/envs/f5-tts/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
src/f5_tts/train/train.py FAILED
Failures:
[1]:
time : 2025-07-17_08:41:11
host : dell-PowerEdge-T630
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 1148464)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Root Cause (first observed failure):
[0]:
time : 2025-07-17_08:41:11
host : dell-PowerEdge-T630
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1148463)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
✔️ Expected Behavior
No response
❌ Actual Behavior
No response