Skip to content

[Bug] 'GPT2InferenceModel' object has no attribute 'generate' #4290

Open
@ErimatOesteRP

Description

@ErimatOesteRP

Describe the bug

uv run .\teste1.py
Arquivo WAV: Sample rate=24000, Channels=1

tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded.
Using model: xtts
GPT2InferenceModel has generative capabilities, as prepare_inputs_for_generation is explicitly defined. However, it doesn't directly inherit from GenerationMixin. From 👉v4.50👈 onwards, PreTrainedModel will NOT inherit from GenerationMixin, and this model will lose the ability to call generate and other related functions.

  • If you're using trust_remote_code=True, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
  • If you are the owner of the model architecture code, please modify your model class such that it inherits from GenerationMixin (after PreTrainedModel, otherwise you'll get an exception).
  • If you are not the owner of the model architecture class, please contact the model code owner to update it.

Text splitted to sentences.
['esse é um teste de clonagem de voz']
Traceback (most recent call last):
File "D:\bkp_hd\Projetos\python\TTS_02\teste1.py", line 22, in
tts.tts_to_file(text=text, speaker_wav=wav_path, language="pt", file_path=output_path)
File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\api.py", line 334, in tts_to_file
wav = self.tts(
^^^^^^^^^
File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\api.py", line 276, in tts
wav = self.synthesizer.tts(
^^^^^^^^^^^^^^^^^^^^^
File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\utils\synthesizer.py", line 386, in tts
outputs = self.tts_model.synthesize(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\tts\models\xtts.py", line 419, in synthesize
return self.full_inference(text, speaker_wav, language, **settings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\tts\models\xtts.py", line 488, in full_inference
return self.inference(
^^^^^^^^^^^^^^^
File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\tts\models\xtts.py", line 541, in inference
gpt_codes = self.gpt.generate(
^^^^^^^^^^^^^^^^^^
File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\tts\layers\xtts\gpt.py", line 590, in generate
gen = self.gpt_inference.generate(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\torch\nn\modules\module.py", line 1940, in getattr
raise AttributeError(
AttributeError: 'GPT2InferenceModel' object has no attribute 'generate'

Atualização do modelo para suportar transformers 4.50+

To Reproduce

rom TTS.api import TTS
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import XttsAudioConfig, XttsArgs
from TTS.config.shared_configs import BaseDatasetConfig
import torch
import soundfile as sf

Adiciona os globals à lista de permitidos

torch.serialization.add_safe_globals([XttsConfig, XttsAudioConfig, BaseDatasetConfig, XttsArgs])

Verificar arquivo WAV de referência

wav_path = "D:/bkp_hd/Projetos/python/TTS_02/temp/referencia.wav"
data, sample_rate = sf.read(wav_path)
print(f"Arquivo WAV: Sample rate={sample_rate}, Channels={data.shape[1] if data.ndim > 1 else 1}")

Inicializar modelo TTS

tts = TTS(model_name="tts_models/multilingual/multi-dataset/xtts_v2", progress_bar=True, gpu=False)

Gerar áudio

text = "esse é um teste de clonagem de voz"
output_path = "D:/bkp_hd/Projetos/python/TTS_02/temp/output.wav"
tts.tts_to_file(text=text, speaker_wav=wav_path, language="pt", file_path=output_path)

print(f"Áudio gerado com sucesso: {output_path}")

Expected behavior

No response

Logs

Environment

uv pip show TTS transformers soundfile
Name: soundfile
Version: 0.13.1
Location: D:\bkp_hd\Projetos\python\TTS_02\.venv\Lib\site-packages
Requires: cffi, numpy
Required-by: librosa, trainer, tts
---
Name: transformers
Version: 4.52.1
Location: D:\bkp_hd\Projetos\python\TTS_02\.venv\Lib\site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: tts
---
Required-by: tts
---
Name: tts
Version: 0.22.0

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions