Sunday, July 31, 2016

Speech Recognition Final Verdict

Final strategy: I'm going to use CMU Sphinx with a small vocabulary trained to my voice for most commands.  I'll use the kaldi-gstreamer-server, or maybe even an online service, for larger, arbitrary pieces of sound - stuff that I can't predict.

Which means that I'll have two separate, behemoth systems installed on the computer.  Ouch.  At least I can stream Kaldi from a different computer.  Sphinx should be small enough to not be a problem.

Here's what I need to be able to train the command and control language model.

Mutt on OpenBSD

Mutt is a command-line based mail reader.  I've heard that even top execs at Google have used it as their primary mail reading tool.  These are my notes for setting it up on OpenBSD 5.9 for use with gmail.  I've heard it can be used in conjunction with davmail(??) to deal with Microsoft Exchange.
sources here:
https://dev.mutt.org/trac/wiki/UseCases/Gmail 
https://www.linux.com/blog/setup-mutt-gmail-centos-and-ubuntu 
in order to use the trash function with Gmail, you have to apply this patch
http://cedricduval.free.fr/mutt/patches/#trash 
how to do it:
http://cedricduval.free.fr/mutt/patches/#patch
install mutt and nano (for ease of command-line text editing).
sudo apt-get install mutt nano
create the mutt config file and open it.
touch ~/.mutt/muttrc
nano ~/.mutt/muttrc
The contents of the config file can vary, based on what you actually want.  Mutt doesn't by default understand spaces in mailbox/folder names, so you have to change that setting in the muttrc file.

Here is a link to my own muttrc file on Github.  

Wednesday, July 27, 2016

Setting Up an Offline Transcriber Using Kaldi - Part 2: EESEN

This is part 2, where I realize that converting an offline transcriber to a different language on my own is a semi-herculean task.  In the issue tracker for alumae's github project, there's a conversation revolving around an English conversion (https://github.com/alumae/kaldi-offline-transcriber/issues/6).  I'm used that to find someone else's project that converts the code to work in English.

There's three similar githubs that I'm going to try this time: github.com/srvk/eesen-transcriber, github.com/srvk/eesen, and github.com/srvk/srvk-eesen-offline-transcriber.  I think the second is the base package, so I'm going to give that a shot first.  The third is a high-level abstraction that makes it easier to transcribe something, and the first appears to be a virtual machine that you can just download and run (more or less).  The only issue with the VM is that you need to dedicate 8GB of RAM...and I don't have that much to give away.  So I'm going to try the others first.  The use of Vagrant is unfamiliar, but I looked at the source website and the concept is pretty cool.  It solves a lot of portability issues that I was planning on kicking down the road.

Actually, here's an explanation of several of the repos' by the author:
We have changed to use other models by brute force; taking out
much of the Estonian and replacing with parts of Kaldi recipes that
do decoding (for example the tedlium recipe). It mostly requires
performing surgery on the Makefile. :)

In particular, for English we do only one pass of decoding, with only
one LM and decoding graph, and skip compounding.
I recently updated a system to use even more different decoding: neural net decoding based on Yajie Miao's EESEN (github.com/yajiemiao/eesen).  You could find the resulting code on the SRVK repo here: github.com/srvk/eesen-transcriber.
I think they want people to use the VMs rather than run it straight on their computers.  It's certainly more consistent, but also more resource-intensive.  I can't do that right now.

Attempt No. 1: Installing eesen

I did run into one hiccup.  The make command includes running a script to check for dependencies, which looks for the program libtool.  It uses the command 
which libtool
to do this.  Only problem is, libtool doesn't quite work like that.  You actually need to install libtool-bin if you want that dependency check to work.  See here for details.  Upshot is, install libtool-bin.
sudo apt-get install libtool-bin
Start by downloading eesen into to your ~/tools directory.  Rename it to eesen-master for clarity's sake.  When you compile, don't forget to run make -j 4 if you can.
cd ~/tools/
git clone https://github.com/srvk/eesen.git
mv ./eesen ./eesen-master
cd ./eesen-master/tools
make
./install_atlas.sh
./install_srilm.sh
Great!  Now EESEN is installed.  I don't know of any checks to perform, aside from whether the make command completed successfully.

Installing srvk-eesen-offline-transcriber

This is the thing that should make using eesen easy(-er).  Clone it and build it.  Since it's a customized version of alumae's kaldi-offline-transcriber, it should install the same way.

Dependencies

Make sure you have this stuff (I assume, since it's required for kaldi-offline-transcriber).
sudo apt-get install build-essential ffmpeg sox libatlas-dev python-pip
You need the OpenFST library, which Kaldi installs when you compile it.  However, since we aren't (necessarily) installing Kaldi, I don't know how to make sure you have OpenFST.  Try this, see if it works; if it doesn't, go here for as much information as I am aware of.
pip install pyfst
Next thing to do is cd into the directory where you're going to put the ESSEN easy transcriber package, and clone the repository.  
cd ~/tools
git clone https://github.com/srvk/srvk-eesen-offline-transcriber.git
cd into the repository you just cloned.
cd ~/tools/srvk-eesen-offline-transcriber
The documentation for the srvk-eesen-offline-transcriber is atrocious.  You can tell the author.  The next step should be to download acoustic and language models, before adding configuration options to the make file and building the transcriber (this is supposed to be based on alumae's Estonian version).  Oh, well.  Leeroy Jenkins!
make .init
Well, that did something.
cat > ./makefile.options [enter]
KALDI_ROOT=/home/$USER/tools/kaldi-master [CTRL-D]
Did nothing whatsoever.  I think I'm just missing the language models, and I don't see anywhere to download them.

Ok, this looks like a dead end.

Attempt No. 2: Using the EESEN Virtual Machine 

I'll try the repo I listed first, that has the Vagrant VM set up.  Here goes.
sudo apt-get install virtualbox vagrant
Now clone the repository.
cd ~/tools
git clone http://github.com/srvk/eesen-transcriber
and cd into it.
cd ./essen-transcriber
This is why the method is so easy - just run
vagrant up
from inside that folder, and everything is downloaded and installed automagically.  Of course, it's downloading a whole preinstalled Ubuntu OS (Ubuntu 14.04 x86, by the look of the terminal output).  Reminds me of some very hackish python solutions I came up with when I was first learning the language.  I'm not a fan, but at least something is working.  If I can track down the setup scripts it's running, I'll try and replicate the VM on my computer's installation.

Expect a lot of output.  So far, vagrant has claimed 2 of my CPUs and has nearly filled my 8 GB of RAM.  This is the only time I've ever seen my computer use swap space.  Clever, I'm watching my system resources and virtualbox seems to be switching off which CPUs are being used.  Probably a temperature thing.

Once that's done, you can run the example transcription with the following command.
vagrant ssh -c "vids2web.sh /vagrant/test2.mp3"
or you can ssh into the VM with this command
vagrant ssh
and then change directories to /home/vagrant/tools/eesen-offline-transcriber where there are readme instructions.
cd /home/vagrant/tools/eesen-offline-transcriber
You can run transcription on an arbitrary audio file (this build is designed to be friendly to a whole bunch of audio formats) with the following command.  Note that speech2text.sh is located in the directory you just changed into above (eesen-offline-transcriber).
./speech2text.sh --txt ./build/output/test2.txt /vagrant/test2.mp3
Read speech2text.sh to see how it works; in this example, the output .txt file is located in ./build/output/ and the audio file is in the user directory.  Here's the output, so you can get an idea of the quality.  This is an excerpt from King Solomon's Mines.
You're warriors much grow where we have resting on their spears introduce.
By law there was one war just after we destroyed the people that came down upon us but it was a civil war dog a dog.
How was that my lord became my half brother had a brother born at the same birth and have the same woman it is not our custom on hard to suffer twins to live the weak are always must died.
But the mother of looking hit away the people child which was born in the last for her heart and over it and that child is to all the king.
In contrast, here's the original:
"Your warriors must grow weary of resting on their spears, Infadoos."
"My lord, there was one war, just after we destroyed the people that came down upon us, but it was a civil war; dog ate dog."
"How was that?"
"My lord the king, my half-brother, had a brother born at the same birth, and of the same woman. It is not our custom, my lord, to suffer twins to live; the weaker must always die. But the mother of the king hid away the feebler child, which was born the last, for her heart yearned over it, and that child is Twala the king.
Unfortunately, the word error rate (WER) is too high to be particularly useful - 19.4%, which is 1/5 of a text.  Try reading anything hair one fifth of the words are smog.  It's not even that guessable.  The other systems I was trying to make work reached 13-9% accuracy; but that was in Estonian.

There's one more thing to try 'easily', which is to add my own language model - whatever that means.

Since the issue with the kaldi-speech-transcriber of part 1 was a lack of an English language model, maybe the next step could be creating / fitting an English model from existing material to work in that context.  I have no idea how large that project would be.  Another option is to look at what speechkitchen.org is doing about improving accuracy.  I do know they took some shortcuts to get eesen up and running.

That's all for now.

Tuesday, July 26, 2016

Setting Up an Offline Transcriber Using Kaldi - Part 1: kaldi-offline-transcriber

This is being recorded as I go.  I'll be editing it and changing it to reflect the best way to set it up.  My goal is to be able to record a snippet of my voice and have it transcribed by a python script I'll write.

First Attempt: Kaldi-offline-transcriber

The first shot at completing this project is this GitHub: github.com/alumae/kaldi-offline-transcriber. The only problem is that this transcriber, though excellent of itself, is built for the Estonian language.  After I successfully get it working in Estonian, I'll see what I can do about English.

I should note that the instructions in the github readme are excellent.  I've rewritten them here so I have easy access to them, and to make them a little better -- just made them cut-and-paste worthy, mostly.

Dependencies Installation

Not sure if this comes with Ubuntu 16.04 or if I'd already installed this for something else, but make sure this is installed.  
sudo apt-get install build-essential
Also install these:
sudo apt-get install ffmpeg sox libatlas-dev 
Install Kaldi.  Don't have to worry about the online extensions, but it won't hurt to have them installed (an extra file compiled in a directory is the only difference).

Make sure Python and pip are installed.
sudo apt-get install python-pip
Install the package pyfst.  One of its dependencies, OpenFst, was compiled and installed with Kaldi.  To exploit that installation, use these install flags when you install pyfst:
CPPFLAGS="-I/home/$USER/tools/kaldi-master/tools/openfst/include -L/home/$USER/tools/kaldi-master/tools/openfst/lib" pip install pyfst
Turns out you also need Java installed, which isn't mentioned in the readme file.  
sudo apt-get install default-jre

Installing the Main Package

Clone the repository.
cd ~/tools
git clone https://github.com/alumae/kaldi-offline-transcriber.git
This is Estonian, remember?  Download and unpack the Estonian language models.
cd ~/tools/kaldi-offline-transcriber
curl http://bark.phon.ioc.ee/tanel/kaldi-offline-transcriber-data-2015-12-29.tgz | tar xvz 
Create a file in the root of the transcriber directory called makefile.options.  Inside, set the KALDI_ROOT option as the root of the kaldi directory.  Use [enter] and [CTRL-D] to complete the command.
cat > ~/tools/kaldi-offline-transcriber/Makefile.options [enter]
KALDI_ROOT=/home/$USER/tools/kaldi-master [CTRL-D]
Without this the compiler will throw an error wondering where the files it's trying to compile are located.  Next, compile.  This should take about 30 minutes, so use the option for multiple cores if possible.
cd ~/tools/kaldi-offline-transcriber/
make -j 4 .init
All compilations are stored under the kaldi-offline-transcriber/build/ directory.  If you want to retry the compilation, just delete that directory and try again.

Example Usage

Using the make command directly

Stick a speech file under src-audio, then execute the command to create the transcription file.  
cd src-audio
wget http://media.kuku.ee/intervjuu/intervjuu201306211256.mp3
cd ..
make build/output/intervjuu201306211256.txt
To remove the intermediate files that are generated with the build command, run:
make .intervjuu201306211256.clean

Using the speech2text.sh script

There was a wrapper created to more easily transcribe audio files located in any directory.  This is accessed with the following example command:
/home/$USER/tools/kaldi-offline-transcriber/speech2text.sh --trs result/test.txt audio/test.ogg

Tweaks

You can speed up transcription by setting another parameter in makefile.options.
nano ~/tools/kaldi-offline-transcriber/Makefile.options
nthreads = 4



My External Display Won't Play Ball

Notes for next time I have to set up my external monitor.  This is for Ubuntu 16.04 LTS.

I have an Acer VGA monitor, resolution 1600 x 900.  The last time I reinstalled Mint, the resolution wasn't detected.  Mint 17 couldn't do it, Ubuntu 16.04 couldn't do it; Windows 8.1 worked almost flawlessly (almost, because some of my attempts to fix the problem messed up the display on the windows side).  

This was fixed a while ago, but I'm making a record of the solution.  I created a small file called set-screen.sh, put it in my home directory, and added it to my startup programs.  It would automatically reset the resolution of my external monitor.  Hackish, but it worked.  
#!/bin/sh
sleep 7

xrandr --newmode "1600x900_60.00"  118.25  1600 1696 1856 2112  900 903 908 934 -hsync +vsync
xrandr --addmode VGA1 1600x900_60.00
xrandr --output VGA1 --mode 1600x900_60.00
The sleep 7 command ensured that my computer had a chance to turn on completely and actually detect the external monitor before running the command to modify the display.  I have a pretty fast computer; the recommended sleep time was actually 15 seconds.  

Installing Kaldi and Kaldi-Gstreamer-server on Ubuntu 16.04

Notes on the process of installing Kaldi and Kaldi-GStreamer-server on Ubuntu 16.04 LTS.  These were modified somewhat, since this is retroactively documented for my own benefit.

Kaldi is a state-of-the-art speech transcription engine, geared towards researchers and people who already know what they're doing.  I'm just trying to set it up.

Decide where to put Kaldi and make that your new working directory.
mkdir ~/tools/
cd tools
Clone Kaldi from github.
git clone https://github.com/kaldi-asr/kaldi.git
cd into this new location.
cd ./kaldi-master/tools
Check for any dependencies.  There were a few things I needed to add to my Ubuntu installation; don't remember what they were.  Do whatever this output instructs.
extras/check_dependencies.sh
Now comes the actual installation.
make
cd ../src
./configure --shared
make depend
make
Run this next to install the online extensions.
make ext
Note: if you have more than one core in your machine, you can run make -j 4 to do make in parallel.

Congratulations.  Kaldi is installed.  Installing Kaldi-GStreamer-server:

Before actually installing the kaldi-gstreamer-server, there's a few more things to do with kaldi itself.
Compile the Gstreamer plugin.  First, install dependencies. Note they are older versions of the packages.  Make sure you get the right version.  On Ubuntu/Debian, run:

sudo apt-get install libgstreamer1.0-dev gstreamer1.0-plugins-good gstreamer1.0-tools gstreamer1.0-pulseaudio
Kaldi-Gstreamer-server requires the gstreamer plugin to be compiled (makes sense).
cd ~/tools/kaldi-master/src/gst-plugin/
make depend
make
This folder (gst-plugin) should now contain the file libgstkaldi.so which contains the Gstreamer plugin.

Now it's time to install the kaldi-gstreamer-server package.  First, more dependencies.
sudo apt-get install pip python-yaml python-gi
pip install tornado ws4py==0.3.2 pyyaml
Note: You might need to run pip as sudo.  e.g. sudo pip install tornado, above.
Note: I couldn't figure out which YAML package to install, so I used both.  At least, they're both installed, and I don't remember which I actually needed.  If I do this again, I'll try to remember to change this.

Clone kaldi-gstreamer-server from GitHub into your tools folder.
cd ~/tools/
git clone https://github.com/alumae/kaldi-gstreamer-server.git

This completes the installation.

cd into the main folder.
cd ./kaldi-gstreamer-server/
Open the README file, peruse until understood.
gedit ./readme.md
Now you'll understand what I mean by server and worker.  You can start the server with:
python kaldigstserver/master_server.py --port=8888
Before starting a worker, make sure that the GST plugin path includes the gstreamer plugin you compiled.  If you put everything where I recommended, this is all you have to do:
export GST_PLUGIN_PATH=~/tools/kaldi-master/src/gst-plugin
Test to make sure it worked.  If it fails, take a look at the README file again.  This command should spit out a bunch of information.  If it just says something like, 'not found', you did something wrong.  I have no idea what.
gst-inspect-1.0 onlinegmmdecodefaster
Now you can start a worker.
python kaldigstserver/worker.py -u ws://localhost:8888/worker/ws/speech -c sample_worker.yaml
Example of how to use the server to transcribe text:
python kaldigstserver/client.py -r 32000 ~/tools/kaldi-gstreamer-server/test/data/english_test.raw
You can also use a Deep Neural Network (DNN) to process the data, but at time of writing the readme walkthrough was giving me errors.

That's it!

Final post here

I'm switching over to github pages .  The continuation of this blog (with archives included) is at umhau.github.io .  By the way, the ...