IIPL_Flitto is a comprehensive speech and text processing toolkit. This repository provides a collection of modules and scripts for advanced speaker diarization, speech enhancement modeling, speech-to-text (STT), text-to-speech (TTS) and text processing. The toolkit is designed to facilitate research and development in automatic speech recognition (ASR), speaker identification, and natural language processing (NLP) tasks.
- OS: Ubuntu 24.04.2
- CUDA Toolkit: 11.7
- GPU Driver: NVIDIA-SMI 570.144 (CUDA Version 12.8)
- Clone this repository and navigate to IIPL_Flitto folder
git clone https://github.com/geuk-hub/IIPL_Flitto.git
cd IIPL_Flitto
- Install Package
conda create --name IIPL_Flitto python=3.9
conda activate IIPL_Flitto
pip install --upgrade "pip<24.1"
cd DiarizeNet && pip install -r requirements.txt
- Install additional packages
pip install Cython librosa pesq pystoi pydub tqdm toml colorful mir_eval torch_complex "numpy<2" "accelerate<1.0.0" ffmpeg --no-deps jieba Mecab pkuseg
conda install -c conda-forge compilers
pip install pkuseg nlptutti transformers soynlp
pip install -U openai-whisper
pip install openai
Download the TTA Test Dataset(wer/cer/llm-based acc).
Download the DiarizeNet Model checkpoint.
Download the AdaptiVoice Model checkpoint.
Download the Crossview-AP Model checkpoint.
Download the crossview_ap_data.
Download the Machine Translation Model checkpoint.
Download the Error Correction Model checkpoint.
Before running the following script, make sure to configure the following environment variables:
-
LANG: Choose one language from KR (Korean), EN (English), CN (Chinese), or JP (Japanese).
-
ROOT: Set this to the full path of your
IIPL_Flitto
repository. -
DIARIZENET_CHECKPOINT: Put the downloaded DiarizeNet checkpoint into the
IIPL_Flitto/checkpoints
directory. -
OPENAI_API_KEY: Provide your OpenAI API key
-
TTA Test Dataset: Put your TTA Test Dataset files into
TTA_test/wer_cer_llm_based_acc_data
folder.- For example:
TTA_test/wer_cer_llm_based_acc_data/KR
TTA_test/wer_cer_llm_based_acc_data/CN
TTA_test/wer_cer_llm_based_acc_data/EN
TTA_test/wer_cer_llm_based_acc_data/JP
- For example:
-
{LANG}_wav.scp: Ensure that the {LANG}_wav.scp file inside TTA_test/wer_cer_llm_based_acc_data/{LANG} contains correct audio paths.
- If you need to update or replace the
/path/to/your
prefixes in any{LANG}_wav.scp
file, you can run thechange_path.py
script located at:This script will automatically replaceIIPL_Flitto/TTA_test/wer_cer_llm_based_acc_data/change_path.py
/path/to/your
with your specifiedroot
path..
- If you need to update or replace the
bash TTA_test/wer_cer_llm_based_acc.sh
- Install Pakages 1
1-1. Create ap_env
environment
conda create -n ap_env python=3.9
conda activate ap_env
1-2. Install packages
conda install -c conda-forge gxx_linux-64
cd IIPL_Flitto/AdaptiVoice/TTS_engine
pip install -e .
cd IIPL_Flitto/AdaptiVoice/voice_engine
pip install -e .
1-3. Install additional packages
conda install -c conda-forge ffmpeg
pip install mecab-python3
python -m unidic download
pip install pkuseg janome konlpy h5py textgrid tgt opencc librosa
- Install Packages 2
2-1. Create mfa_env
environment
conda create -n mfa_env -c conda-forge montreal-forced-aligner
conda activate mfa_env
2-2. Install packages
pip install python-mecab-ko jamo spacy-pkuseg dragonmapper hanziconv textgrid tgt
conda install -c conda-forge spacy sudachipy sudachidict-core
- Run
Before running the following script, make sure to configure the following environment variables:
- root: Set this to the full path of your
IIPL_Flitto
repository. - lang: Choose one language from kr (Korean), en (English), cn (Chinese), or jp (Japanes).
- AdaptiVoice_ckpt: Put the downloaded AdaptiVoice model checkpoint into the
IIPL_Flitto/checkpoints
directory. - Crossview_AP_ckpt: Put the downloaded Crossview-AP model checkpoint into the
IIPL_Flitto/checkpoints
directory. - crossview_ap_data: Put the downloaded crossview_ap_data into the
IIPL_Flitto/TTA_test/crossview_ap_data
directory.
3-1. evaluation only
- input: align.hdf5, feats.hdf5, vocab
bash TTA_test/crossview_ap_eval_only.sh
3-2. evaluation from scratch
- input: rttm
- data generation: rttm -> tts -> timestamp -> vocab -> hdf5
bash TTA_test/crossview_ap_from_scratch.sh
- Install Package
conda create -n mt python=3.9
conda activate mt
cd IIPL_Flitto/Text_Processing/Machine_Translation
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install numpy regex sacrebleu tensorboard matplotlib pandas cython setuptools pyarrow sacremoses tensorboardX unbabel-comet
pip install pip==23.3.1
conda install -c conda-forge gxx_linux-64
pip install --editable ./
- Run
Before running the following script, make sure to configure the following environment variables:
- root: Set this to the full path of your
IIPL_Flitto
repository. - Machine_Translation_ckpt: Put the downloaded Machine Translation checkpoint into the
IIPL_Flitto/checkpoints
directory.
bash TTA_test/bleu_comet.sh
- Install Package
conda create -n ec python=3.12
conda activate ec
pip install unsloth hgtk
- Run
Before running the following script, make sure to configure the following environment variables:
- ROOT: Set this to the full path of your
IIPL_Flitto
repository. - MODEL_PATH: Set this to the full path of your
Error Correction
checkpoints folder.
python Text_Processing/Error_Correction/LLM_grammer_inference.py