Showing posts with label alumae. Show all posts
Showing posts with label alumae. Show all posts

Tuesday, July 26, 2016

Setting Up an Offline Transcriber Using Kaldi - Part 1: kaldi-offline-transcriber

This is being recorded as I go.  I'll be editing it and changing it to reflect the best way to set it up.  My goal is to be able to record a snippet of my voice and have it transcribed by a python script I'll write.

First Attempt: Kaldi-offline-transcriber

The first shot at completing this project is this GitHub: github.com/alumae/kaldi-offline-transcriber. The only problem is that this transcriber, though excellent of itself, is built for the Estonian language.  After I successfully get it working in Estonian, I'll see what I can do about English.

I should note that the instructions in the github readme are excellent.  I've rewritten them here so I have easy access to them, and to make them a little better -- just made them cut-and-paste worthy, mostly.

Dependencies Installation

Not sure if this comes with Ubuntu 16.04 or if I'd already installed this for something else, but make sure this is installed.  
sudo apt-get install build-essential
Also install these:
sudo apt-get install ffmpeg sox libatlas-dev 
Install Kaldi.  Don't have to worry about the online extensions, but it won't hurt to have them installed (an extra file compiled in a directory is the only difference).

Make sure Python and pip are installed.
sudo apt-get install python-pip
Install the package pyfst.  One of its dependencies, OpenFst, was compiled and installed with Kaldi.  To exploit that installation, use these install flags when you install pyfst:
CPPFLAGS="-I/home/$USER/tools/kaldi-master/tools/openfst/include -L/home/$USER/tools/kaldi-master/tools/openfst/lib" pip install pyfst
Turns out you also need Java installed, which isn't mentioned in the readme file.  
sudo apt-get install default-jre

Installing the Main Package

Clone the repository.
cd ~/tools
git clone https://github.com/alumae/kaldi-offline-transcriber.git
This is Estonian, remember?  Download and unpack the Estonian language models.
cd ~/tools/kaldi-offline-transcriber
curl http://bark.phon.ioc.ee/tanel/kaldi-offline-transcriber-data-2015-12-29.tgz | tar xvz 
Create a file in the root of the transcriber directory called makefile.options.  Inside, set the KALDI_ROOT option as the root of the kaldi directory.  Use [enter] and [CTRL-D] to complete the command.
cat > ~/tools/kaldi-offline-transcriber/Makefile.options [enter]
KALDI_ROOT=/home/$USER/tools/kaldi-master [CTRL-D]
Without this the compiler will throw an error wondering where the files it's trying to compile are located.  Next, compile.  This should take about 30 minutes, so use the option for multiple cores if possible.
cd ~/tools/kaldi-offline-transcriber/
make -j 4 .init
All compilations are stored under the kaldi-offline-transcriber/build/ directory.  If you want to retry the compilation, just delete that directory and try again.

Example Usage

Using the make command directly

Stick a speech file under src-audio, then execute the command to create the transcription file.  
cd src-audio
wget http://media.kuku.ee/intervjuu/intervjuu201306211256.mp3
cd ..
make build/output/intervjuu201306211256.txt
To remove the intermediate files that are generated with the build command, run:
make .intervjuu201306211256.clean

Using the speech2text.sh script

There was a wrapper created to more easily transcribe audio files located in any directory.  This is accessed with the following example command:
/home/$USER/tools/kaldi-offline-transcriber/speech2text.sh --trs result/test.txt audio/test.ogg

Tweaks

You can speed up transcription by setting another parameter in makefile.options.
nano ~/tools/kaldi-offline-transcriber/Makefile.options
nthreads = 4



Installing Kaldi and Kaldi-Gstreamer-server on Ubuntu 16.04

Notes on the process of installing Kaldi and Kaldi-GStreamer-server on Ubuntu 16.04 LTS.  These were modified somewhat, since this is retroactively documented for my own benefit.

Kaldi is a state-of-the-art speech transcription engine, geared towards researchers and people who already know what they're doing.  I'm just trying to set it up.

Decide where to put Kaldi and make that your new working directory.
mkdir ~/tools/
cd tools
Clone Kaldi from github.
git clone https://github.com/kaldi-asr/kaldi.git
cd into this new location.
cd ./kaldi-master/tools
Check for any dependencies.  There were a few things I needed to add to my Ubuntu installation; don't remember what they were.  Do whatever this output instructs.
extras/check_dependencies.sh
Now comes the actual installation.
make
cd ../src
./configure --shared
make depend
make
Run this next to install the online extensions.
make ext
Note: if you have more than one core in your machine, you can run make -j 4 to do make in parallel.

Congratulations.  Kaldi is installed.  Installing Kaldi-GStreamer-server:

Before actually installing the kaldi-gstreamer-server, there's a few more things to do with kaldi itself.
Compile the Gstreamer plugin.  First, install dependencies. Note they are older versions of the packages.  Make sure you get the right version.  On Ubuntu/Debian, run:

sudo apt-get install libgstreamer1.0-dev gstreamer1.0-plugins-good gstreamer1.0-tools gstreamer1.0-pulseaudio
Kaldi-Gstreamer-server requires the gstreamer plugin to be compiled (makes sense).
cd ~/tools/kaldi-master/src/gst-plugin/
make depend
make
This folder (gst-plugin) should now contain the file libgstkaldi.so which contains the Gstreamer plugin.

Now it's time to install the kaldi-gstreamer-server package.  First, more dependencies.
sudo apt-get install pip python-yaml python-gi
pip install tornado ws4py==0.3.2 pyyaml
Note: You might need to run pip as sudo.  e.g. sudo pip install tornado, above.
Note: I couldn't figure out which YAML package to install, so I used both.  At least, they're both installed, and I don't remember which I actually needed.  If I do this again, I'll try to remember to change this.

Clone kaldi-gstreamer-server from GitHub into your tools folder.
cd ~/tools/
git clone https://github.com/alumae/kaldi-gstreamer-server.git

This completes the installation.

cd into the main folder.
cd ./kaldi-gstreamer-server/
Open the README file, peruse until understood.
gedit ./readme.md
Now you'll understand what I mean by server and worker.  You can start the server with:
python kaldigstserver/master_server.py --port=8888
Before starting a worker, make sure that the GST plugin path includes the gstreamer plugin you compiled.  If you put everything where I recommended, this is all you have to do:
export GST_PLUGIN_PATH=~/tools/kaldi-master/src/gst-plugin
Test to make sure it worked.  If it fails, take a look at the README file again.  This command should spit out a bunch of information.  If it just says something like, 'not found', you did something wrong.  I have no idea what.
gst-inspect-1.0 onlinegmmdecodefaster
Now you can start a worker.
python kaldigstserver/worker.py -u ws://localhost:8888/worker/ws/speech -c sample_worker.yaml
Example of how to use the server to transcribe text:
python kaldigstserver/client.py -r 32000 ~/tools/kaldi-gstreamer-server/test/data/english_test.raw
You can also use a Deep Neural Network (DNN) to process the data, but at time of writing the readme walkthrough was giving me errors.

That's it!

Final post here

I'm switching over to github pages .  The continuation of this blog (with archives included) is at umhau.github.io .  By the way, the ...