This project is all in one, easy-to-use voice conversion tool. With the goal of creating high-quality and high-performance voice conversion products, the project allows users to change voices smoothly and naturally.
Feature | Description |
---|---|
Music Separation | Utilizes MDX-Net for separating audio tracks. |
Voice Conversion | Supports file conversion, batch conversion, conversion with Whisper, and text-to-speech conversion. |
Background Music Editing | Enables editing and manipulation of background music tracks. |
Apply Effects to Audio | Allows application of various effects to enhance or modify audio output. |
Generate Training Data | Creates training data from linked paths for model training. |
Model Training | Supports v1 and v2 models with high-quality encoders for training. |
Model Fusion | Facilitates combining multiple models for enhanced performance. |
Read Model Information | Provides functionality to access and display model metadata. |
Export Models to ONNX | Enables exporting trained models to ONNX format for compatibility. |
Download from Pre-existing Model Repositories | Allows downloading models from established repositories. |
Search for Models on the Web | Supports searching for models online for easy access. |
Pitch Extraction | Extracts pitch information from audio inputs. |
Support for Audio Conversion Inference Using ONNX Models | Enables inference for audio conversion using ONNX-compatible models. |
ONNX RVC Models with Indexing | Supports ONNX RVC models with indexing for efficient inference. |
Multiple Model Options | |
F0: pm, dio, mangio-crepe-tiny, mangio-crepe-small, mangio-crepe-medium, mangio-crepe-large, mangio-crepe-full, crepe-tiny, crepe-small, crepe-medium, crepe-large, crepe-full, fcpe, fcpe-legacy, rmvpe, rmvpe-legacy, harvest, yin, pyin, swipe |
|
F0_ONNX: Some models converted to ONNX for accelerated pitch extraction. | |
F0_HYBRID: Combines multiple options, e.g., hybrid[rmvpe+harvest] , or all options together. |
|
EMBEDDERS: contentvec_base, hubert_base, japanese_hubert_base, korean_hubert_base, chinese_hubert_base, portuguese_hubert_base |
|
EMBEDDERS_ONNX: Pre-converted ONNX versions of embedding models for accelerated extraction. | |
EMBEDDERS_TRANSFORMERS: Pre-converted Hugging Face versions of embedding models as an alternative to Fairseq. | |
SPIN_EMBEDDERS: A new embedding extraction model offering potentially higher quality than older methods. |
- Step 1: Install Python from the official website or Python (REQUIRES PYTHON 3.10.x OR PYTHON 3.11.x)
- Step 2: Install FFmpeg from FFMPEG, extract it, and add it to PATH
- Step 3: Download and extract the source code
- Step 4: Navigate to the source code directory and open Command Prompt or Terminal
- Step 5: Run the command to install the required libraries
Install:
you can install URVC with runing `run_install.bat`or simpily run pip install -r requirements.txt
Step 6:
Run the `run_app` file to open the user interface (Note: Do not close the Command Prompt or Terminal for the interface)Alternatively, use Command Prompt or Terminal in the source code directory
To allow the interface to access files outside the project, add --allow_all_disk
to the command:
env\Scripts\python.exe main\app\app.py --open
To use TensorBoard for training monitoring:
Run the file: tensorboard or the command
env\Scripts\python.exe main\app\tensorboard.py
python main\app\parser.py --help
- This project only supports NVIDIA GPUs
- Currently, new encoders like MRF HIFIGAN do not yet have complete pre-trained datasets
- MRF HIFIGAN and REFINEGAN encoders do not support training without pitch training
- Models in the URVC repository are collected from AI Hub, HuggingFace, and other repositories. They may carry different licenses (For example, Audioldm2 has model weights with a "Non-Commercial" clause).
- This source code contains third party software components licensed under "non-commercial" terms. Any commercial use, including solicitation of donations or financialization of derivatives, may violate the license and will be subject to appropriate legal liability.
-
You must ensure that the audio content you upload and convert through this project does not violate the intellectual property rights of third parties.
-
The project must not be used for any illegal activities, including but not limited to fraud, harassment, or causing harm to others.
-
You are solely responsible for any damages arising from improper use of the product.
-
I will not be responsible for any direct or indirect damages arising from the use of this project.
Project | Author/Organization | License |
---|---|---|
Vietnamese-RVC | Phạm Huỳnh Anh | MIT License |
Applio | IAHispano | MIT License |
Python-audio-separator | Nomad Karaoke | MIT License |
Retrieval-based-Voice-Conversion-WebUI | RVC Project | MIT License |
RVC-ONNX-INFER-BY-Anh | Phạm Huỳnh Anh | MIT License |
Torch-Onnx-Crepe-By-Anh | Phạm Huỳnh Anh | MIT License |
Hubert-No-Fairseq | Phạm Huỳnh Anh | MIT License |
Local-attention | Phil Wang | MIT License |
TorchFcpe | CN_ChiTu | MIT License |
FcpeONNX | Yury | MIT License |
ContentVec | Kaizhi Qian | MIT License |
Mediafiredl | Santiago Ariel Mansilla | MIT License |
Noisereduce | Tim Sainburg | MIT License |
World.py-By-Anh | Phạm Huỳnh Anh | MIT License |
Mega.py | Marco Trevisan | No License |
Gdown | Kentaro Wada | MIT License |
Whisper | OpenAI | MIT License |
PyannoteAudio | pyannote | MIT License |
AudioEditingCode | Hila Manor | MIT License |
StftPitchShift | Jürgen Hock | MIT License |
Codename-RVC-Fork-3 | Codename;0 | MIT License |
This document provides detailed information on the pitch extraction methods used, including their advantages, limitations, strengths, and reliability based on personal experience.
Method | Type | Advantages | Limitations | Strength | Reliability |
---|---|---|---|---|---|
pm | Praat | Fast | Less accurate | Low | Low |
dio | PYWORLD | Suitable for rap | Less accurate at high frequencies | Medium | Medium |
harvest | PYWORLD | More accurate than DIO | Slower processing | High | Very high |
crepe | Deep Learning | High accuracy | Requires GPU | Very high | Very high |
mangio-crepe | Crepe finetune | Optimized for RVC | Sometimes less accurate than original crepe | Medium to high | Medium to high |
fcpe | Deep Learning | Accurate, real-time | Requires powerful GPU | Good | Medium |
fcpe-legacy | Old | Accurate, real-time | Older | Good | Medium |
rmvpe | Deep Learning | Effective for singing voices | Resource-intensive | Very high | Excellent |
rmvpe-legacy | Old | Supports older systems | Older | High | Good |
yin | Librosa | Simple, efficient | Prone to octave errors | Medium | Low |
pyin | Librosa | More stable than YIN | More complex computation | Good | Good |
swipe | WORLD | High accuracy | Sensitive to noise | High | Good |
-
If you encounter an error while using this source code, I sincerely apologize for the poor experience. You can report the bug using the methods below.
-
you can report bugs to us via ISSUE.