Skip to content

[Bug] Make hop_length and win_length in AudioProcessor prioritize passed sample point parameters #4289

Open
@zepeng-wan

Description

@zepeng-wan

Feature Description
Currently, in the init method of the AudioProcessor class in the TTS project (specifically in the file TTS/TTS/utils/audio/processor.py), when the hop_length parameter is None, the millisec_to_length function is called. It calculates hop_length and win_length based on millisecond - based parameters like frame_length_ms and frame_shift_ms. However, in certain practical scenarios, we may prefer to directly use the passed sample - point parameters hop_length and win_length instead of calculating them from millisecond - based values.
Solution
Modify the init method of the AudioProcessor class so that it first checks if valid hop_length and win_length values are passed. If they are not None, these passed values should be used directly. Only when both hop_length and win_length are None should the millisec_to_length function be called to calculate them from the millisecond - based parameters.

The relevant original code is:
if hop_length is None:
# compute stft parameters from given time values
self.win_length, self.hop_length = millisec_to_length(
frame_length_ms=self.frame_length_ms, frame_shift_ms=self.frame_shift_ms, sample_rate=self.sample_rate
)
else:
# use stft parameters from config file
self.hop_length = hop_length
self.win_length = win_length

The expected modified code would be:
if hop_length is not None and win_length is not None:
self.hop_length = hop_length
self.win_length = win_length
else:
# compute stft parameters from given time values
self.win_length, self.hop_length = millisec_to_length(
frame_length_ms=self.frame_length_ms, frame_shift_ms=self.frame_shift_ms, sample_rate=self.sample_rate
)

Alternative Solutions
One alternative could be to add an additional flag parameter, like use_sample_point_params (defaulting to False), which, when set to True, forces the use of passed hop_length and win_length parameters over the millisecond - based calculation. However, this would add extra complexity to the API. Another option could be to have a separate function dedicated to calculating hop_length and win_length based on different priorities, but this might also over - complicate the code structure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestfeature requests for making TTS better.wontfixThis will not be worked on but feel free to help.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions