CMU Sphinx is advanced enough to use its understanding of grammar to help it figure out the likelihood that a particular word was spoken. To do this, it needs to have a predefined concept of which words tend to follow each other -- it needs to understand the format of what is spoken to it. The context of a 'command and control' AI has a very specific type of grammar involved, where the format is predominately commands and statements.
If CMU Sphinx has been made to recognize that, it will be able to filter words that don't make sense in that context and weight more heavily words that do make sense as control words: it will know that 'play music' is more likely than 'pink music', and 'shutdown' is more likely to be a command than 'showdown'.
For now, here's the primary sources:
http://cmusphinx.sourceforge.net/wiki/tutoriallm
http://www.speech.cs.cmu.edu/tools/lmtool-new.html
Using this, it should be possible to create the grammar language model based on a big list of sentences; only problem is, I don't have a sentence list. Once that LM has been created, the voice data I've created should be retrained - even that is done based on grammar statistics.
Final post here
I'm switching over to github pages . The continuation of this blog (with archives included) is at umhau.github.io . By the way, the ...
-
A beowulf cluster lets me tie miscellaneous computers together and use their cpus like one large processor...I think. Never done this befor...
-
I still use OpenBSD for my server, so here goes setting up an NFS on it. This is a fantastic resource. Note: if having trouble with the...
-
Notes on the process of installing Kaldi and Kaldi-GStreamer-server on Ubuntu 16.04 LTS. These were modified somewhat, since this is retr...