0
本文作者: 汪思穎 | 2018-01-04 09:44 |
雷鋒網(wǎng) AI科技評(píng)論消息,日前, Facebook 人工智能研究院發(fā)布 wav2letter 工具包,它是一個(gè)簡(jiǎn)單高效的端到端自動(dòng)語(yǔ)音識(shí)別(ASR)系統(tǒng),實(shí)現(xiàn)了 Wav2Letter: an End-to-End ConvNet-based Speech Recognition System 和 Letter-Based Speech Recognition with Gated ConvNets 這兩篇論文中提出的架構(gòu)。如果大家想現(xiàn)在就開(kāi)始使用這個(gè)工具進(jìn)行語(yǔ)音識(shí)別,F(xiàn)acebook 提供 Librispeech 數(shù)據(jù)集的預(yù)訓(xùn)練模型。
以下為對(duì)系統(tǒng)的要求,以及這一工具的安裝教程,雷鋒網(wǎng) AI科技評(píng)論整理如下:
安裝要求:
系統(tǒng):MacOS 或 Linux
Torch:接下來(lái)會(huì)介紹安裝教程
在 CPU 上訓(xùn)練:Intel MKL
在 GPU 上訓(xùn)練:英偉達(dá) CUDA 工具包 (cuDNN v5.1 for CUDA 8.0)
音頻文件讀取:Libsndfile
標(biāo)準(zhǔn)語(yǔ)音特征:FFTW
安裝:
MKL
如果想在 CPU 上進(jìn)行訓(xùn)練,強(qiáng)烈建議安裝 Intel MKL
執(zhí)行如下代碼更新 .bashrc file
# We assume Torch will be installed in $HOME/usr.
# Change according to your needs.
export PATH=$HOME/usr/bin:$PATH
# This is to detect MKL during compilation
# but also to make sure it is found at runtime.
INTEL_DIR=/opt/intel/lib/intel64
MKL_DIR=/opt/intel/mkl/lib/intel64
MKL_INC_DIR=/opt/intel/mkl/include
if [ ! -d "$INTEL_DIR" ]; then
echo "$ warning: INTEL_DIR out of date"fi
if [ ! -d "$MKL_DIR" ]; then
echo "$ warning: MKL_DIR out of date"fi
if [ ! -d "$MKL_INC_DIR" ]; then
echo "$ warning: MKL_INC_DIR out of date"fi
# Make sure MKL can be found by Torch.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$INTEL_DIR:$MKL_DIR
export CMAKE_LIBRARY_PATH=$LD_LIBRARY_PATH
export CMAKE_INCLUDE_PATH=$CMAKE_INCLUDE_PATH:$MKL_INC_DIR
LuaJIT 和 LuaRocks
執(zhí)行如下代碼可以在 $HOME/usr 下安裝 LuaJIT 和 LuaRocks,如果你想要進(jìn)行系統(tǒng)級(jí)安裝,刪掉代碼中的 -DCMAKE_INSTALL_PREFIX=$HOME/usr 即可。
git clone https://github.com/torch/luajit-rocks.git
cd luajit-rocks
mkdir build; cd build
cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/usr -DWITH_LUAJIT21=OFF
make -j 4
make installcd ../..
接下來(lái),我們假定 luarocks 和 luajit 被安裝在 $PATH 下,如果你把它們安裝在 $HOME/usr 下了,可以執(zhí)行 ~/usr/bin/luarocks 和 ~/usr/bin/luajit 這兩段代碼。
如果你想采用 wav2letter decoder,需要安裝 KenLM。
這里需要用到 Boost:
# make sure boost is installed (with system/thread/test modules)
# actual command might vary depending on your system
sudo apt-get install libboost-dev libboost-system-dev libboost-thread-dev libboost-test-dev
Boost 安裝之后就可以安裝 KenLM 了:
wget https://kheafield.com/code/kenlm.tar.gz
tar xfvz kenlm.tar.gzcd kenlm
mkdir build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/usr -DCMAKE_POSITION_INDEPENDENT_CODE=ON
make -j 4
make install
cp -a lib/* ~/usr/lib # libs are not installed by default :(cd ../..
如果計(jì)劃用到多 CPU/GPU(或者多設(shè)備),需要安裝 OpenMPI 和 TorchMPI
免責(zé)聲明:我們非常鼓勵(lì)大家重新編譯 OpenMPI。標(biāo)準(zhǔn)發(fā)布版本中的 OpenMPI 二進(jìn)制文件編譯標(biāo)記不一致,想要成功編譯和運(yùn)行 TorchMPI,確定的編譯標(biāo)記至關(guān)重要。
先安裝 OpenMPI:
wget https://www.open-mpi.org/software/ompi/v2.1/downloads/openmpi-2.1.2.tar.bz2
tar xfj openmpi-2.1.2.tar.bz2cd openmpi-2.1.2; mkdir build; cd build
./configure --prefix=$HOME/usr --enable-mpi-cxx --enable-shared --with-slurm --enable-mpi-thread-multiple --enable-mpi-ext=affinity,cuda --with-cuda=/public/apps/cuda/9.0
make -j 20 all
make install
注意:也可以執(zhí)行 openmpi-3.0.0.tar.bz2,但需要?jiǎng)h掉 --enable-mpi-thread-multiple。
接下來(lái)可以安裝 TorchMPI 了:
MPI_CXX_COMPILER=$HOME/usr/bin/mpicxx ~/usr/bin/luarocks install torchmpi
Torch 和其他 Torch 包
luarocks install torch
luarocks install cudnn # for GPU supportluarocks install cunn # for GPU support
wav2letter 包
git clone https://github.com/facebookresearch/wav2letter.git
cd wav2letter
cd gtn && luarocks make rocks/gtn-scm-1.rockspec && cd ..
cd speech && luarocks make rocks/speech-scm-1.rockspec && cd ..
cd torchnet-optim && luarocks make rocks/torchnet-optim-scm-1.rockspec && cd ..
cd wav2letter && luarocks make rocks/wav2letter-scm-1.rockspec && cd ..
# Assuming here you got KenLM in $HOME/kenlm
# And only if you plan to use the decoder:
cd beamer && KENLM_INC=$HOME/kenlm luarocks make rocks/beamer-scm-1.rockspec && cd ..
訓(xùn)練 wav2letter 模型
數(shù)據(jù)預(yù)處理
數(shù)據(jù)文件夾中有預(yù)處理不同數(shù)據(jù)集的多個(gè)腳本,現(xiàn)在我們只提供預(yù)處理 LibriSpeech 和 TIMIT 數(shù)據(jù)集的腳本。
下面是預(yù)處理 LibriSpeech ASR 數(shù)據(jù)集的案例:
wget http://www.openslr.org/resources/12/dev-clean.tar.gz
tar xfvz dev-clean.tar.gz# repeat for train-clean-100, train-clean-360, train-other-500, dev-other, test-clean, test-other
luajit ~/wav2letter/data/librispeech/create.lua ~/LibriSpeech ~/librispeech-proc
luajit ~/wav2letter/data/utils/create-sz.lua librispeech-proc/train-clean-100 librispeech-proc/train-clean-360 librispeech-proc/train-other-500 librispeech-proc/dev-clean librispeech-proc/dev-other librispeech-proc/test-clean librispeech-proc/test-other
訓(xùn)練
mkdir experiments
luajit ~/wav2letter/train.lua --train -rundir ~/experiments -runname hello_librispeech -arch ~/wav2letter/arch/librispeech-glu-highdropout -lr 0.1 -lrcrit 0.0005 -gpu 1 -linseg 1 -linlr 0 -linlrcrit 0.005 -onorm target -nthread 6 -dictdir ~/librispeech-proc -datadir ~/librispeech-proc -train train-clean-100+train-clean-360+train-other-500 -valid dev-clean+dev-other -test test-clean+test-other -gpu 1 -sqnorm -mfsc -melfloor 1 -surround "|" -replabel 2 -progress -wnorm -normclamp 0.2 -momentum 0.9 -weightdecay 1e-05
多 GPU 訓(xùn)練
利用 OpenMPI
mpirun -n 2 --bind-to none ~/TorchMPI/scripts/wrap.sh luajit ~/wav2letter/train.lua --train -mpi -gpu 1 ...
運(yùn)行 decoder(推理階段)
為了運(yùn)行 decoder,需要做少量預(yù)處理。
首先創(chuàng)建一個(gè)字母詞典,其中包括在 wav2letter 中用到的特殊重復(fù)字母:
cat ~/librispeech-proc/letters.lst >> ~/librispeech-proc/letters-rep.lst && echo "1" >> ~/librispeech-proc/letters-rep.lst && echo "2" >> ~/librispeech-proc/letters-rep.lst
然后將得到一個(gè)語(yǔ)言模型,并對(duì)這個(gè)模型進(jìn)行預(yù)處理。這里,我們將使用預(yù)先訓(xùn)練過(guò)的 LibriSpeech 語(yǔ)言模型,大家也可以用 KenLM 訓(xùn)練自己的模型。然后,我們對(duì)模型進(jìn)行預(yù)處理,腳本可能會(huì)對(duì)錯(cuò)誤轉(zhuǎn)錄的單詞給予警告,這不是什么大問(wèn)題,因?yàn)檫@些詞很少見(jiàn)。
wget http://www.openslr.org/resources/11/3-gram.pruned.3e-7.arpa.gz luajit
~/wav2letter/data/utils/convert-arpa.lua ~/3-gram.pruned.3e-7.arpa.gz ~/3-gram.pruned.3e-7.arpa ~/dict.lst -preprocess ~/wav2letter/data/librispeech/preprocess.lua -r 2 -letters letters-rep.lst
可選項(xiàng):利用 KenLM 將模型轉(zhuǎn)換成二進(jìn)制格式,加載起來(lái)將會(huì)更快。
build_binary 3-gram.pruned.3e-7.arpa 3-gram.pruned.3e-7.bin
現(xiàn)在運(yùn)行 test.lua lua,可以生成 emission。下面的腳本可以顯示出字母錯(cuò)誤率 (LER) 和單詞錯(cuò)誤率 (WER)。
luajit ~/wav2letter/test.lua ~/experiments/hello_librispeech/001_model_dev-clean.bin -progress -show -test dev-clean -save
一旦存儲(chǔ)好 emission,可以執(zhí)行 decoder 來(lái)計(jì)算 WER:
luajit ~/wav2letter/decode.lua ~/experiments/hello_librispeech dev-clean -show -letters ~/librispeech-proc/letters-rep.lst -words ~/dict.lst -lm ~/3-gram.pruned.3e-7.arpa -lmweight 3.1639 -beamsize 25000 -beamscore 40 -nthread 10 -smearing max -show
預(yù)訓(xùn)練好的模型:
我們提供訓(xùn)練充分的 LibriSpeech 模型:
wget https://s3.amazonaws.com/wav2letter/models/librispeech-glu-highdropout.bin
注意:該模型是在 Facebook 的框架下訓(xùn)練好的,因此需要用稍微不同的參數(shù)來(lái)運(yùn)行 test.lua
luajit ~/wav2letter/test.lua ~/librispeech-glu-highdropout.bin -progress -show -test dev-clean -save -datadir ~/librispeech-proc/ -dictdir ~/librispeech-proc/ -gfsai
大家可以加入 wav2letter 社群
Facebook:https://www.facebook.com/groups/717232008481207/
Google 社群:https://groups.google.com/forum/#!forum/wav2letter-users
via:GitHub
雷鋒網(wǎng) AI 科技評(píng)論編譯整理。
雷峰網(wǎng)版權(quán)文章,未經(jīng)授權(quán)禁止轉(zhuǎn)載。詳情見(jiàn)轉(zhuǎn)載須知。