Skip to content

Spanish Gigaword text based POCOLM and RNNLM training recipe #3136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 58 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
801ab93
Merge pull request #1 from kaldi-asr/master
saikiranvalluri Oct 30, 2018
04c4a03
Merge remote-tracking branch 'upstream/master'
GoVivace Dec 8, 2018
ea699b0
Merge remote-tracking branch 'upstream/master'
GoVivace Jan 17, 2019
cbc8eeb
Spanish Gigaword LM recipe
Feb 19, 2019
e8aecbb
Some bug fixes
saikiranvalluri Feb 19, 2019
ece34bd
Update rnnlm.sh
saikiranvalluri Feb 19, 2019
0c4fe47
Combining lexicon words with pocolm wordslist for RNNLM training
Feb 19, 2019
92e241b
merge conflict resolved
Feb 19, 2019
1439b0d
Integrated the 2 stage scientific method POCOLM training for Gigaword…
saikiranvalluri Feb 24, 2019
8ad0e01
Update train_pocolm.sh
saikiranvalluri Feb 26, 2019
f856ac2
Update run.sh
saikiranvalluri Feb 27, 2019
684f029
Text cleaning script for splitting Abbreviation words added
saikiranvalluri Feb 28, 2019
185da3a
Update clean_txt_dir.sh
saikiranvalluri Feb 28, 2019
cb393c8
Update clean_txt_dir.sh
saikiranvalluri Feb 28, 2019
18a9cb6
Update train_pocolm.sh
saikiranvalluri Feb 28, 2019
b023638
Update pocolm_cust.sh
saikiranvalluri Feb 28, 2019
46550f0
Cosmetic fixes
saikiranvalluri Feb 28, 2019
ce3c7d7
Update path.sh
saikiranvalluri Feb 28, 2019
deeaaa7
Bug fix in text normalisation script for gigaword corpus
saikiranvalluri Mar 1, 2019
633f21d
small Fix path.sh
saikiranvalluri Mar 1, 2019
8d6b14d
Update clean_abbrevs_text.py
saikiranvalluri Mar 1, 2019
8c9c37b
Added sparrowhawk installation script for text normalisation
saikiranvalluri Mar 1, 2019
c6b05d1
G2P training stage added into Spanish gigaword recipe
saikiranvalluri Mar 2, 2019
8c226cc
G2P seq2seq scripts added in steps/
saikiranvalluri Mar 2, 2019
7b67fc2
RNNLM scripts updated to UTF8 encoding
saikiranvalluri Mar 2, 2019
4767c7c
Update pocolm_cust.sh
saikiranvalluri Mar 8, 2019
2cd5948
Update run.sh
saikiranvalluri Mar 8, 2019
6595b42
Added steps for generating POCOLM ARPA file
saikiranvalluri Mar 18, 2019
0902c9e
Update run.sh
saikiranvalluri Mar 24, 2019
d8a90ec
Merge branch 'master' into feature/Spanish_gigaword_LM
saikiranvalluri Mar 24, 2019
c10b0fe
Apply g2p part added to get extended lexicon
saikiranvalluri Mar 24, 2019
15a34e8
Merge branch 'feature/Spanish_gigaword_LM' of https://github.com/GoVi…
saikiranvalluri Mar 24, 2019
3df45ae
Small fix in run.sh rnnlm_wordlist
saikiranvalluri Mar 24, 2019
7e47695
Added sanity chack for Sparrowhawk normalizer in cleanup script
saikiranvalluri Mar 25, 2019
91a4611
Data preparation fixes
saikiranvalluri Mar 25, 2019
5f45dd1
Cosmetic options for gigaword textclean
saikiranvalluri Mar 26, 2019
e711d30
Some fixes in rnnlm training
saikiranvalluri Apr 1, 2019
8d521c6
Moved s5_gigaword directory to s5
saikiranvalluri Apr 1, 2019
c57ed95
Merge branch 'master' into feature/Spanish_gigaword_LM
saikiranvalluri Apr 2, 2019
f610470
removed s5_gigaword folder
saikiranvalluri Apr 2, 2019
f810119
Small cleanup for scripts format
saikiranvalluri Apr 2, 2019
dc8a56e
Cosmetic fix
saikiranvalluri Apr 5, 2019
ec0edc5
Merge branch 'master' into feature/Spanish_gigaword_LM
saikiranvalluri Apr 12, 2019
8b8222e
Remove virtenv dependency
saikiranvalluri Apr 18, 2019
0e7afa8
Update path.sh
saikiranvalluri Apr 19, 2019
56d2db9
Update install_sparrowhawk.sh
saikiranvalluri Apr 19, 2019
fb6693e
Set lang to ESP
saikiranvalluri Apr 20, 2019
ce0f420
Set pocolm option - --limit-unk-history=true
saikiranvalluri Apr 23, 2019
9487ce1
Removed unused code
saikiranvalluri Apr 23, 2019
25609c5
Fix in checking for empty space lines in lexicon
saikiranvalluri Apr 23, 2019
510db0f
Fix in RNNLM rescoring decode stage
saikiranvalluri Apr 25, 2019
9894f4c
Update run.sh
saikiranvalluri Apr 26, 2019
3bdb541
Update clean_txt_dir.sh
saikiranvalluri May 20, 2019
6636557
Update run.sh
saikiranvalluri Jun 9, 2019
69b1bca
Merge branch 'master' into feature/Spanish_gigaword_LM
saikiranvalluri Jun 9, 2019
36499a7
Update run.sh
saikiranvalluri Jul 7, 2019
8da5c3e
Reverse the order of Abbreviation process after punct syms
saikiranvalluri Jul 13, 2019
510b415
Update run_norm.sh
saikiranvalluri Aug 21, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Merge remote-tracking branch 'upstream/master'
  • Loading branch information
GoVivace committed Jan 17, 2019
commit ea699b0a87a8ceeef1e7093f6134a02e2012ec10

This merge commit was added into this branch cleanly.

There are no new changes to show, but you can still view the diff.