Description
Describe the bug
uv run .\teste1.py
Arquivo WAV: Sample rate=24000, Channels=1
tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded.
Using model: xtts
GPT2InferenceModel has generative capabilities, asprepare_inputs_for_generation
is explicitly defined. However, it doesn't directly inherit fromGenerationMixin
. From 👉v4.50👈 onwards,PreTrainedModel
will NOT inherit fromGenerationMixin
, and this model will lose the ability to callgenerate
and other related functions.
- If you're using
trust_remote_code=True
, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes - If you are the owner of the model architecture code, please modify your model class such that it inherits from
GenerationMixin
(afterPreTrainedModel
, otherwise you'll get an exception). - If you are not the owner of the model architecture class, please contact the model code owner to update it.
Text splitted to sentences.
['esse é um teste de clonagem de voz']
Traceback (most recent call last):
File "D:\bkp_hd\Projetos\python\TTS_02\teste1.py", line 22, in
tts.tts_to_file(text=text, speaker_wav=wav_path, language="pt", file_path=output_path)
File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\api.py", line 334, in tts_to_file
wav = self.tts(
^^^^^^^^^
File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\api.py", line 276, in tts
wav = self.synthesizer.tts(
^^^^^^^^^^^^^^^^^^^^^
File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\utils\synthesizer.py", line 386, in tts
outputs = self.tts_model.synthesize(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\tts\models\xtts.py", line 419, in synthesize
return self.full_inference(text, speaker_wav, language, **settings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\tts\models\xtts.py", line 488, in full_inference
return self.inference(
^^^^^^^^^^^^^^^
File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\tts\models\xtts.py", line 541, in inference
gpt_codes = self.gpt.generate(
^^^^^^^^^^^^^^^^^^
File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\tts\layers\xtts\gpt.py", line 590, in generate
gen = self.gpt_inference.generate(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\torch\nn\modules\module.py", line 1940, in getattr
raise AttributeError(
AttributeError: 'GPT2InferenceModel' object has no attribute 'generate'
Atualização do modelo para suportar transformers 4.50+
To Reproduce
rom TTS.api import TTS
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import XttsAudioConfig, XttsArgs
from TTS.config.shared_configs import BaseDatasetConfig
import torch
import soundfile as sf
Adiciona os globals à lista de permitidos
torch.serialization.add_safe_globals([XttsConfig, XttsAudioConfig, BaseDatasetConfig, XttsArgs])
Verificar arquivo WAV de referência
wav_path = "D:/bkp_hd/Projetos/python/TTS_02/temp/referencia.wav"
data, sample_rate = sf.read(wav_path)
print(f"Arquivo WAV: Sample rate={sample_rate}, Channels={data.shape[1] if data.ndim > 1 else 1}")
Inicializar modelo TTS
tts = TTS(model_name="tts_models/multilingual/multi-dataset/xtts_v2", progress_bar=True, gpu=False)
Gerar áudio
text = "esse é um teste de clonagem de voz"
output_path = "D:/bkp_hd/Projetos/python/TTS_02/temp/output.wav"
tts.tts_to_file(text=text, speaker_wav=wav_path, language="pt", file_path=output_path)
print(f"Áudio gerado com sucesso: {output_path}")
Expected behavior
No response
Logs
Environment
uv pip show TTS transformers soundfile
Name: soundfile
Version: 0.13.1
Location: D:\bkp_hd\Projetos\python\TTS_02\.venv\Lib\site-packages
Requires: cffi, numpy
Required-by: librosa, trainer, tts
---
Name: transformers
Version: 4.52.1
Location: D:\bkp_hd\Projetos\python\TTS_02\.venv\Lib\site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: tts
---
Required-by: tts
---
Name: tts
Version: 0.22.0
Additional context
No response