This is being recorded as I go. I'll be editing it and changing it to reflect the best way to set it up. My goal is to be able to record a snippet of my voice and have it transcribed by a python script I'll write.
I should note that the instructions in the github readme are excellent. I've rewritten them here so I have easy access to them, and to make them a little better -- just made them cut-and-paste worthy, mostly.
First Attempt: Kaldi-offline-transcriber
The first shot at completing this project is this GitHub: github.com/alumae/kaldi-offline-transcriber. The only problem is that this transcriber, though excellent of itself, is built for the Estonian language. After I successfully get it working in Estonian, I'll see what I can do about English.I should note that the instructions in the github readme are excellent. I've rewritten them here so I have easy access to them, and to make them a little better -- just made them cut-and-paste worthy, mostly.
Dependencies Installation
Not sure if this comes with Ubuntu 16.04 or if I'd already installed this for something else, but make sure this is installed.
sudo apt-get install build-essential
Also install these:
sudo apt-get install ffmpeg sox libatlas-dev
Install Kaldi. Don't have to worry about the online extensions, but it won't hurt to have them installed (an extra file compiled in a directory is the only difference).
Make sure Python and pip are installed.
sudo apt-get install python-pip
Install the package pyfst. One of its dependencies, OpenFst, was compiled and installed with Kaldi. To exploit that installation, use these install flags when you install pyfst:
CPPFLAGS="-I/home/$USER/tools/kaldi-master/tools/openfst/include -L/home/$USER/tools/kaldi-master/tools/openfst/lib" pip install pyfst
Turns out you also need Java installed, which isn't mentioned in the readme file.
sudo apt-get install default-jre
Installing the Main Package
Clone the repository.
cd ~/tools git clone https://github.com/alumae/kaldi-offline-transcriber.git
This is Estonian, remember? Download and unpack the Estonian language models.
cd ~/tools/kaldi-offline-transcriber curl http://bark.phon.ioc.ee/tanel/kaldi-offline-transcriber-data-2015-12-29.tgz | tar xvz
Create a file in the root of the transcriber directory called makefile.options. Inside, set the KALDI_ROOT option as the root of the kaldi directory. Use [enter] and [CTRL-D] to complete the command.
cat > ~/tools/kaldi-offline-transcriber/Makefile.options [enter] KALDI_ROOT=/home/$USER/tools/kaldi-master [CTRL-D]
Without this the compiler will throw an error wondering where the files it's trying to compile are located. Next, compile. This should take about 30 minutes, so use the option for multiple cores if possible.
cd ~/tools/kaldi-offline-transcriber/ make -j 4 .init
All compilations are stored under the kaldi-offline-transcriber/build/ directory. If you want to retry the compilation, just delete that directory and try again.
Example Usage
Using the make command directly
Stick a speech file under src-audio, then execute the command to create the transcription file.
cd src-audio wget http://media.kuku.ee/intervjuu/intervjuu201306211256.mp3 cd .. make build/output/intervjuu201306211256.txt
To remove the intermediate files that are generated with the build command, run:
make .intervjuu201306211256.clean
Using the speech2text.sh script
There was a wrapper created to more easily transcribe audio files located in any directory. This is accessed with the following example command:
/home/$USER/tools/kaldi-offline-transcriber/speech2text.sh --trs result/test.txt audio/test.ogg
Tweaks
You can speed up transcription by setting another parameter in makefile.options.
nano ~/tools/kaldi-offline-transcriber/Makefile.options nthreads = 4