Skip to content

Commit 416852e

Browse files
yfyeungYifan Yangyfy62
authored
Add Zipformer recipe for GigaSpeech (k2-fsa#1254)
Co-authored-by: Yifan Yang <[email protected]> Co-authored-by: yfy62 <[email protected]>
1 parent eef47ad commit 416852e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+6036
-2
lines changed
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
#!/usr/bin/env bash
2+
3+
set -e
4+
5+
log() {
6+
# This function is from espnet
7+
local fname=${BASH_SOURCE[1]##*/}
8+
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
9+
}
10+
11+
cd egs/gigaspeech/ASR
12+
13+
repo_url=https://huggingface.co/yfyeung/icefall-asr-gigaspeech-zipformer-2023-10-17
14+
15+
log "Downloading pre-trained model from $repo_url"
16+
git lfs install
17+
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
18+
repo=$(basename $repo_url)
19+
20+
log "Display test files"
21+
tree $repo/
22+
ls -lh $repo/test_wavs/*.wav
23+
24+
pushd $repo/exp
25+
git lfs pull --include "data/lang_bpe_500/bpe.model"
26+
git lfs pull --include "data/lang_bpe_500/tokens.txt"
27+
git lfs pull --include "exp/jit_script.pt"
28+
git lfs pull --include "exp/pretrained.pt"
29+
ln -s pretrained.pt epoch-99.pt
30+
ls -lh *.pt
31+
popd
32+
33+
log "Export to torchscript model"
34+
./zipformer/export.py \
35+
--exp-dir $repo/exp \
36+
--use-averaged-model false \
37+
--tokens $repo/data/lang_bpe_500/tokens.txt \
38+
--epoch 99 \
39+
--avg 1 \
40+
--jit 1
41+
42+
ls -lh $repo/exp/*.pt
43+
44+
log "Decode with models exported by torch.jit.script()"
45+
46+
./zipformer/jit_pretrained.py \
47+
--tokens $repo/data/lang_bpe_500/tokens.txt \
48+
--nn-model-filename $repo/exp/jit_script.pt \
49+
$repo/test_wavs/1089-134686-0001.wav \
50+
$repo/test_wavs/1221-135766-0001.wav \
51+
$repo/test_wavs/1221-135766-0002.wav
52+
53+
for method in greedy_search modified_beam_search fast_beam_search; do
54+
log "$method"
55+
56+
./zipformer/pretrained.py \
57+
--method $method \
58+
--beam-size 4 \
59+
--checkpoint $repo/exp/pretrained.pt \
60+
--tokens $repo/data/lang_bpe_500/tokens.txt \
61+
$repo/test_wavs/1089-134686-0001.wav \
62+
$repo/test_wavs/1221-135766-0001.wav \
63+
$repo/test_wavs/1221-135766-0002.wav
64+
done
65+
66+
echo "GITHUB_EVENT_NAME: ${GITHUB_EVENT_NAME}"
67+
echo "GITHUB_EVENT_LABEL_NAME: ${GITHUB_EVENT_LABEL_NAME}"
68+
if [[ x"${GITHUB_EVENT_NAME}" == x"schedule" || x"${GITHUB_EVENT_LABEL_NAME}" == x"run-decode" ]]; then
69+
mkdir -p zipformer/exp
70+
ln -s $PWD/$repo/exp/pretrained.pt zipformer/exp/epoch-999.pt
71+
ln -s $PWD/$repo/data/lang_bpe_500 data/
72+
73+
ls -lh data
74+
ls -lh zipformer/exp
75+
76+
log "Decoding test-clean and test-other"
77+
78+
# use a small value for decoding with CPU
79+
max_duration=100
80+
81+
for method in greedy_search fast_beam_search modified_beam_search; do
82+
log "Decoding with $method"
83+
84+
./zipformer/decode.py \
85+
--decoding-method $method \
86+
--epoch 999 \
87+
--avg 1 \
88+
--use-averaged-model 0 \
89+
--max-duration $max_duration \
90+
--exp-dir zipformer/exp
91+
done
92+
93+
rm zipformer/exp/*.pt
94+
fi
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
# Copyright 2022 Fangjun Kuang ([email protected])
2+
3+
# See ../../LICENSE for clarification regarding multiple authors
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
17+
name: run-gigaspeech-zipformer-2023-10-17
18+
# zipformer
19+
20+
on:
21+
push:
22+
branches:
23+
- master
24+
pull_request:
25+
types: [labeled]
26+
27+
schedule:
28+
# minute (0-59)
29+
# hour (0-23)
30+
# day of the month (1-31)
31+
# month (1-12)
32+
# day of the week (0-6)
33+
# nightly build at 15:50 UTC time every day
34+
- cron: "50 15 * * *"
35+
36+
concurrency:
37+
group: run_gigaspeech_2023_10_17_zipformer-${{ github.ref }}
38+
cancel-in-progress: true
39+
40+
jobs:
41+
run_gigaspeech_2023_10_17_zipformer:
42+
if: github.event.label.name == 'zipformer' ||github.event.label.name == 'ready' || github.event.label.name == 'run-decode' || github.event_name == 'push' || github.event_name == 'schedule'
43+
runs-on: ${{ matrix.os }}
44+
strategy:
45+
matrix:
46+
os: [ubuntu-latest]
47+
python-version: [3.8]
48+
49+
fail-fast: false
50+
51+
steps:
52+
- uses: actions/checkout@v2
53+
with:
54+
fetch-depth: 0
55+
56+
- name: Setup Python ${{ matrix.python-version }}
57+
uses: actions/setup-python@v2
58+
with:
59+
python-version: ${{ matrix.python-version }}
60+
cache: 'pip'
61+
cache-dependency-path: '**/requirements-ci.txt'
62+
63+
- name: Install Python dependencies
64+
run: |
65+
grep -v '^#' ./requirements-ci.txt | xargs -n 1 -L 1 pip install
66+
pip uninstall -y protobuf
67+
pip install --no-binary protobuf protobuf==3.20.*
68+
69+
- name: Cache kaldifeat
70+
id: my-cache
71+
uses: actions/cache@v2
72+
with:
73+
path: |
74+
~/tmp/kaldifeat
75+
key: cache-tmp-${{ matrix.python-version }}-2023-05-22
76+
77+
- name: Install kaldifeat
78+
if: steps.my-cache.outputs.cache-hit != 'true'
79+
shell: bash
80+
run: |
81+
.github/scripts/install-kaldifeat.sh
82+
83+
- name: Inference with pre-trained model
84+
shell: bash
85+
env:
86+
GITHUB_EVENT_NAME: ${{ github.event_name }}
87+
GITHUB_EVENT_LABEL_NAME: ${{ github.event.label.name }}
88+
run: |
89+
mkdir -p egs/gigaspeech/ASR/data
90+
ln -sfv ~/tmp/fbank-libri egs/gigaspeech/ASR/data/fbank
91+
ls -lh egs/gigaspeech/ASR/data/*
92+
93+
sudo apt-get -qq install git-lfs tree
94+
export PYTHONPATH=$PWD:$PYTHONPATH
95+
export PYTHONPATH=~/tmp/kaldifeat/kaldifeat/python:$PYTHONPATH
96+
export PYTHONPATH=~/tmp/kaldifeat/build/lib:$PYTHONPATH
97+
98+
.github/scripts/run-gigaspeech-zipformer-2023-10-17.sh
99+
100+
- name: Display decoding results for gigaspeech zipformer
101+
if: github.event_name == 'schedule' || github.event.label.name == 'run-decode'
102+
shell: bash
103+
run: |
104+
cd egs/gigaspeech/ASR/
105+
tree ./zipformer/exp
106+
107+
cd zipformer
108+
echo "results for zipformer"
109+
echo "===greedy search==="
110+
find exp/greedy_search -name "log-*" -exec grep -n --color "best for test-clean" {} + | sort -n -k2
111+
find exp/greedy_search -name "log-*" -exec grep -n --color "best for test-other" {} + | sort -n -k2
112+
113+
echo "===fast_beam_search==="
114+
find exp/fast_beam_search -name "log-*" -exec grep -n --color "best for test-clean" {} + | sort -n -k2
115+
find exp/fast_beam_search -name "log-*" -exec grep -n --color "best for test-other" {} + | sort -n -k2
116+
117+
echo "===modified beam search==="
118+
find exp/modified_beam_search -name "log-*" -exec grep -n --color "best for test-clean" {} + | sort -n -k2
119+
find exp/modified_beam_search -name "log-*" -exec grep -n --color "best for test-other" {} + | sort -n -k2
120+
121+
- name: Upload decoding results for gigaspeech zipformer
122+
uses: actions/upload-artifact@v2
123+
if: github.event_name == 'schedule' || github.event.label.name == 'run-decode'
124+
with:
125+
name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-ubuntu-latest-cpu-zipformer-2022-11-11
126+
path: egs/gigaspeech/ASR/zipformer/exp/

README.md

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -148,8 +148,11 @@ in the decoding.
148148

149149
### GigaSpeech
150150

151-
We provide two models for this recipe: [Conformer CTC model][GigaSpeech_conformer_ctc]
152-
and [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][GigaSpeech_pruned_transducer_stateless2].
151+
We provide three models for this recipe:
152+
153+
- [Conformer CTC model][GigaSpeech_conformer_ctc]
154+
- [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][GigaSpeech_pruned_transducer_stateless2].
155+
- [Transducer: Zipformer encoder + Embedding decoder][GigaSpeech_zipformer]
153156

154157
#### Conformer CTC
155158

@@ -165,6 +168,14 @@ and [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned R
165168
| fast beam search | 10.50 | 10.69 |
166169
| modified beam search | 10.40 | 10.51 |
167170

171+
#### Transducer: Zipformer encoder + Embedding decoder
172+
173+
| | Dev | Test |
174+
|----------------------|-------|-------|
175+
| greedy search | 10.31 | 10.50 |
176+
| fast beam search | 10.26 | 10.48 |
177+
| modified beam search | 10.25 | 10.38 |
178+
168179

169180
### Aishell
170181

@@ -378,6 +389,7 @@ Please see: [![Open In Colab](https://colab.research.google.com/assets/colab-bad
378389
[TED-LIUM3_pruned_transducer_stateless]: egs/tedlium3/ASR/pruned_transducer_stateless
379390
[GigaSpeech_conformer_ctc]: egs/gigaspeech/ASR/conformer_ctc
380391
[GigaSpeech_pruned_transducer_stateless2]: egs/gigaspeech/ASR/pruned_transducer_stateless2
392+
[GigaSpeech_zipformer]: egs/gigaspeech/ASR/zipformer
381393
[Aidatatang_200zh_pruned_transducer_stateless2]: egs/aidatatang_200zh/ASR/pruned_transducer_stateless2
382394
[WenetSpeech_pruned_transducer_stateless2]: egs/wenetspeech/ASR/pruned_transducer_stateless2
383395
[WenetSpeech_pruned_transducer_stateless5]: egs/wenetspeech/ASR/pruned_transducer_stateless5

egs/gigaspeech/ASR/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ ln -sfv /path/to/GigaSpeech download/GigaSpeech
1515
## Performance Record
1616
| | Dev | Test |
1717
|--------------------------------|-------|-------|
18+
| `zipformer` | 10.25 | 10.38 |
1819
| `conformer_ctc` | 10.47 | 10.58 |
1920
| `pruned_transducer_stateless2` | 10.40 | 10.51 |
2021

egs/gigaspeech/ASR/RESULTS.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,78 @@
11
## Results
2+
### zipformer (zipformer + pruned stateless transducer)
3+
4+
See <https://github.com/k2-fsa/icefall/pull/1254> for more details.
5+
6+
[zipformer](./zipformer)
7+
8+
- Non-streaming
9+
- normal-scaled model, number of model parameters: 65549011, i.e., 65.55 M
10+
11+
You can find a pretrained model, training logs, decoding logs, and decoding results at:
12+
<https://huggingface.co/yfyeung/icefall-asr-gigaspeech-zipformer-2023-10-17>
13+
14+
The tensorboard log for training is available at
15+
<https://wandb.ai/yifanyeung/icefall-asr-gigaspeech-zipformer-2023-10-20>
16+
17+
You can use <https://github.com/k2-fsa/sherpa> to deploy it.
18+
19+
| decoding method | test-clean | test-other | comment |
20+
|----------------------|------------|------------|--------------------|
21+
| greedy_search | 10.31 | 10.50 | --epoch 30 --avg 9 |
22+
| modified_beam_search | 10.25 | 10.38 | --epoch 30 --avg 9 |
23+
| fast_beam_search | 10.26 | 10.48 | --epoch 30 --avg 9 |
24+
25+
The training command is:
26+
```bash
27+
export CUDA_VISIBLE_DEVICES="0,1,2,3"
28+
./zipformer/train.py \
29+
--world-size 4 \
30+
--num-epochs 30 \
31+
--start-epoch 1 \
32+
--use-fp16 1 \
33+
--exp-dir zipformer/exp \
34+
--causal 0 \
35+
--subset XL \
36+
--max-duration 700 \
37+
--use-transducer 1 \
38+
--use-ctc 0 \
39+
--lr-epochs 1 \
40+
--master-port 12345
41+
```
42+
43+
The decoding command is:
44+
```bash
45+
export CUDA_VISIBLE_DEVICES=0
46+
47+
# greedy search
48+
./zipformer/decode.py \
49+
--epoch 30 \
50+
--avg 9 \
51+
--exp-dir ./zipformer/exp \
52+
--max-duration 1000 \
53+
--decoding-method greedy_search
54+
55+
# modified beam search
56+
./zipformer/decode.py \
57+
--epoch 30 \
58+
--avg 9 \
59+
--exp-dir ./zipformer/exp \
60+
--max-duration 1000 \
61+
--decoding-method modified_beam_search \
62+
--beam-size 4
63+
64+
# fast beam search (one best)
65+
./zipformer/decode.py \
66+
--epoch 30 \
67+
--avg 9 \
68+
--exp-dir ./zipformer/exp \
69+
--max-duration 1000 \
70+
--decoding-method fast_beam_search \
71+
--beam 20.0 \
72+
--max-contexts 8 \
73+
--max-states 64
74+
```
75+
276
### GigaSpeech BPE training results (Pruned Transducer 2)
377

478
#### 2022-05-12

0 commit comments

Comments
 (0)