*nixing Around: alumae

This is being recorded as I go. I'll be editing it and changing it to reflect the best way to set it up. My goal is to be able to record a snippet of my voice and have it transcribed by a python script I'll write.

First Attempt: Kaldi-offline-transcriber

The first shot at completing this project is this GitHub: github.com/alumae/kaldi-offline-transcriber. The only problem is that this transcriber, though excellent of itself, is built for the Estonian language. After I successfully get it working in Estonian, I'll see what I can do about English.

I should note that the instructions in the github readme are excellent. I've rewritten them here so I have easy access to them, and to make them a little better -- just made them cut-and-paste worthy, mostly.

Dependencies Installation

Not sure if this comes with Ubuntu 16.04 or if I'd already installed this for something else, but make sure this is installed.

sudo apt-get install build-essential

Also install these:

sudo apt-get install ffmpeg sox libatlas-dev

Install Kaldi. Don't have to worry about the online extensions, but it won't hurt to have them installed (an extra file compiled in a directory is the only difference).

Make sure Python and pip are installed.

sudo apt-get install python-pip

Install the package pyfst. One of its dependencies, OpenFst, was compiled and installed with Kaldi. To exploit that installation, use these install flags when you install pyfst:

CPPFLAGS="-I/home/$USER/tools/kaldi-master/tools/openfst/include -L/home/$USER/tools/kaldi-master/tools/openfst/lib" pip install pyfst

Turns out you also need Java installed, which isn't mentioned in the readme file.

sudo apt-get install default-jre

Installing the Main Package

Clone the repository.

cd ~/tools
git clone https://github.com/alumae/kaldi-offline-transcriber.git

This is Estonian, remember? Download and unpack the Estonian language models.

cd ~/tools/kaldi-offline-transcriber
curl http://bark.phon.ioc.ee/tanel/kaldi-offline-transcriber-data-2015-12-29.tgz | tar xvz

Create a file in the root of the transcriber directory called makefile.options. Inside, set the KALDI_ROOT option as the root of the kaldi directory. Use [enter] and [CTRL-D] to complete the command.

cat > ~/tools/kaldi-offline-transcriber/Makefile.options [enter]
KALDI_ROOT=/home/$USER/tools/kaldi-master [CTRL-D]

Without this the compiler will throw an error wondering where the files it's trying to compile are located. Next, compile. This should take about 30 minutes, so use the option for multiple cores if possible.

cd ~/tools/kaldi-offline-transcriber/
make -j 4 .init

All compilations are stored under the kaldi-offline-transcriber/build/ directory. If you want to retry the compilation, just delete that directory and try again.

Example Usage

Using the make command directly

Stick a speech file under src-audio, then execute the command to create the transcription file.

cd src-audio
wget http://media.kuku.ee/intervjuu/intervjuu201306211256.mp3
cd ..
make build/output/intervjuu201306211256.txt

To remove the intermediate files that are generated with the build command, run:

make .intervjuu201306211256.clean

Using the speech2text.sh script

There was a wrapper created to more easily transcribe audio files located in any directory. This is accessed with the following example command:

/home/$USER/tools/kaldi-offline-transcriber/speech2text.sh --trs result/test.txt audio/test.ogg

Tweaks

You can speed up transcription by setting another parameter in makefile.options.

nano ~/tools/kaldi-offline-transcriber/Makefile.options
nthreads = 4

Notes on the process of installing Kaldi and Kaldi-GStreamer-server on Ubuntu 16.04 LTS. These were modified somewhat, since this is retroactively documented for my own benefit.

Kaldi is a state-of-the-art speech transcription engine, geared towards researchers and people who already know what they're doing. I'm just trying to set it up.

Decide where to put Kaldi and make that your new working directory.

mkdir ~/tools/
cd tools

Clone Kaldi from github.

git clone https://github.com/kaldi-asr/kaldi.git

cd into this new location.

cd ./kaldi-master/tools

Check for any dependencies. There were a few things I needed to add to my Ubuntu installation; don't remember what they were. Do whatever this output instructs.

extras/check_dependencies.sh

Now comes the actual installation.

make
cd ../src
./configure --shared
make depend
make

Run this next to install the online extensions.

make ext

Note: if you have more than one core in your machine, you can run make -j 4 to do make in parallel.

Congratulations. Kaldi is installed. Installing Kaldi-GStreamer-server:

Before actually installing the kaldi-gstreamer-server, there's a few more things to do with kaldi itself.
Compile the Gstreamer plugin. First, install dependencies. Note they are older versions of the packages. Make sure you get the right version. On Ubuntu/Debian, run:

sudo apt-get install libgstreamer1.0-dev gstreamer1.0-plugins-good gstreamer1.0-tools gstreamer1.0-pulseaudio

Kaldi-Gstreamer-server requires the gstreamer plugin to be compiled (makes sense).

cd ~/tools/kaldi-master/src/gst-plugin/
make depend
make

This folder (gst-plugin) should now contain the file libgstkaldi.so which contains the Gstreamer plugin.

Now it's time to install the kaldi-gstreamer-server package. First, more dependencies.

sudo apt-get install pip python-yaml python-gi
pip install tornado ws4py==0.3.2 pyyaml

Note: You might need to run pip as sudo. e.g. sudo pip install tornado, above.
Note: I couldn't figure out which YAML package to install, so I used both. At least, they're both installed, and I don't remember which I actually needed. If I do this again, I'll try to remember to change this.

Clone kaldi-gstreamer-server from GitHub into your tools folder.

cd ~/tools/
git clone https://github.com/alumae/kaldi-gstreamer-server.git

This completes the installation.

cd into the main folder.

cd ./kaldi-gstreamer-server/

Open the README file, peruse until understood.

gedit ./readme.md

Now you'll understand what I mean by server and worker. You can start the server with:

python kaldigstserver/master_server.py --port=8888

Before starting a worker, make sure that the GST plugin path includes the gstreamer plugin you compiled. If you put everything where I recommended, this is all you have to do:

export GST_PLUGIN_PATH=~/tools/kaldi-master/src/gst-plugin

Test to make sure it worked. If it fails, take a look at the README file again. This command should spit out a bunch of information. If it just says something like, 'not found', you did something wrong. I have no idea what.

gst-inspect-1.0 onlinegmmdecodefaster

Now you can start a worker.

python kaldigstserver/worker.py -u ws://localhost:8888/worker/ws/speech -c sample_worker.yaml

Example of how to use the server to transcribe text:

python kaldigstserver/client.py -r 32000 ~/tools/kaldi-gstreamer-server/test/data/english_test.raw

You can also use a Deep Neural Network (DNN) to process the data, but at time of writing the readme walkthrough was giving me errors.

That's it!

*nixing Around

Tuesday, July 26, 2016

Setting Up an Offline Transcriber Using Kaldi - Part 1: kaldi-offline-transcriber

First Attempt: Kaldi-offline-transcriber

Dependencies Installation

Installing the Main Package

Example Usage

Using the make command directly

Using the speech2text.sh script

Tweaks

Installing Kaldi and Kaldi-Gstreamer-server on Ubuntu 16.04

Final post here

Pages

Search This Blog