There's only one GUI tool for working with the Logical Volume Management paradigm, and in Fedora it's already been deprecated without a functional replacement.
(Apparently LVM is the partitioning scheme of the future)
Anyway, this is a personal note to myself that system-config-lvm is the best way to go, and it's available from the Fedora archives here. To go straight to the download, go here.
It was last available in Fedora 19, so who knows how long it'll work for.
Friday, December 23, 2016
Thursday, December 22, 2016
Installing Django
This is an incredible pain to figure out. Dependencies - python 3.5, and don't forget to set up virtualenvs...whatever those are. Don't get me wrong: I've done plenty of python, but never really wanted to deal with something that complicated.
The goal here is to get django installed. Stretch goals, dealt with later, are a) to get django-based website running locally and b) to do lots of cool pythonic things in the background.
Based on further exploration of how these things work, I am amazed how complicated this is. And the documentation is basically written for someone with complete conceptual understanding of the software, who just forgot the commands they need. Not that beginner-friendly.
Note: if you're using ubuntu, that last command might not work. If it doesn't, you'll have to make sure the python3 versions of everything are installed and then use a different form of the command.
The goal here is to get django installed. Stretch goals, dealt with later, are a) to get django-based website running locally and b) to do lots of cool pythonic things in the background.
Based on further exploration of how these things work, I am amazed how complicated this is. And the documentation is basically written for someone with complete conceptual understanding of the software, who just forgot the commands they need. Not that beginner-friendly.
Sources
Dependencies
Make sure python is installed. And the other stuff.sudo apt-get install python3.5 sudo apt-get install python-pip sudo pip install virtualenv
Create your virtualenv (src)
Apparently, it's good to store your virtualenvs all in the same place. The recommended location is ./.virtualenvs, so that's what we'll use.
mkdir ~/.virtualenvs python3 -m venv ~/.virtualenvs/myfirstdjangowebsite
sudo apt-get install python3-pip sudo pip3 install virtualenv virtualenv --python=`which python3` ~/.virtualenvs/myfirstdjangowebsiteNow activate the virtualenv. If the first command doesn't work, use the second. Remember: you have to activate the virtualenv in every new terminal window.
source ~/.virtualenvs/myfirstdjangowebsite/bin/activate . ~/.virtualenvs/myfirstdjangowebsite/bin/activate
Install Django
Once that virtualenv is set up, install django.
pip install Django
Django is now available within the virtualenv you set up.
Wednesday, November 30, 2016
Raspberry Pi Zero USB Audio on Raspbian Jessie (method 2)
Well, this is the second way to skin the cat. Note that arecord ignores what went into the recording section of the config file specified below. See the execution section for actually making a record or playing something.
Followed the instructions of OP in this post, more or less (as detailed below). Note the extra information provided by the top post at this link.
For some reason, the space on the RPi's SD card filled up completely after following those instructions and messing with the results. Not even enough space to run ls on a big directory. Something I'd expect if I'd run arecord instead of aplay, and left it going indefinitely.
Log into the pi.
Followed the instructions of OP in this post, more or less (as detailed below). Note the extra information provided by the top post at this link.
For some reason, the space on the RPi's SD card filled up completely after following those instructions and messing with the results. Not even enough space to run ls on a big directory. Something I'd expect if I'd run arecord instead of aplay, and left it going indefinitely.
Log into the pi.
ssh pi@raspberrypiInstall a dependency (don't worry, it's small).
sudo apt-get install libasound2-pluginsEdit the ALSA config file, but first backup the original.
cp ~/.asoundrc ~/.asoundrc.bak sudo nano ~/.asoundrcAdd the following to the file.
pcm.!default {
type asym
playback.pcm "defaultplayback"
capture.pcm "defaultrec"
hint{ show on
description "default play and rec koko"
}
}
pcm.defaultrec {
type plug
slave {
pcm "hw:1,0"
rate 48000
channels 2
format S16_LE
}
hint{ show on
description "default rec koko"
}
}
pcm.defaultplayback{
type rate
slave.pcm mix1
slave.rate 48000
#Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz:
#converter "samplerate_best" # perfect: 16%cpu, maybe overkill
#converter "samplerate_medium" # almost perfect: 6%cpu
#converter "samplerate" # good: 4%cpu, definitely usable
#converter "samplerate_linear" # bad: 2%cpu, way better than default wine resampler
#converter "samplerate_order" # very bad: 2%cpu, like the default wine resampler
converter "samplerate"
hint{ show on
description "default play koko"
}
}
pcm.mix1 {
type dmix
ipc_key 1024
slave {
pcm "hw:1,0"
rate 48000
periods 128
period_time 0
period_size 1024 # must be power of 2
buffer_size 65536
}
}
Execution
To play a file (of the proper format):aplay file.wav
To record a file:arecord -f cd file.wav
Read the aplay or arecord manual to see what's up with the -f cd thing. Essentially, it's specifying the format the audio should be in.man arecord
Tuesday, November 22, 2016
Raspberry Pi Zero USB Audio on Raspbian Jessie (method 1)
I'm surprised how much of a pain it is to get a USB audio controller set up without a GUI. I'm trying to use the RPi Zero, which doesn't have any audio hardware - not even a pin-based PWN situation. Anything has to be done with extra hardware - either USB or custom analog with an ADC (analog-digital converter).
Note that extra static on the line seems to be related to the bitrate and frequency of the recording. If one has a lot of static, try another. The static on playback and when nothing is playing seems to be from the line - it's on a USB hub that also runs the powerful TP-Link wifi adapter. Since USB is already directly connected to the RPi's GND, there isn't really anything else that can be done about the static. Hence, custom hardware as the alternative.
There is more than one way to skin a cat; this is the first method, which I think I prefer. The other method is here.
Overall, setting up audio on Linux and Raspbian is an overly complex process that is definitely not user-friendly. I'm a user, and this was not a positive experience to sort out.
What happened: made the change, rebooted and ran aplay -l to see what was up: the USB device wasn't even listed. Apparently, that lower-level adjustment was not pleasing to the ALSA gods. A better magic is required.
Note that extra static on the line seems to be related to the bitrate and frequency of the recording. If one has a lot of static, try another. The static on playback and when nothing is playing seems to be from the line - it's on a USB hub that also runs the powerful TP-Link wifi adapter. Since USB is already directly connected to the RPi's GND, there isn't really anything else that can be done about the static. Hence, custom hardware as the alternative.
There is more than one way to skin a cat; this is the first method, which I think I prefer. The other method is here.
Overall, setting up audio on Linux and Raspbian is an overly complex process that is definitely not user-friendly. I'm a user, and this was not a positive experience to sort out.
ID your sound card devices
Figure out what the card, device and subdevice numbers are. This command makes it pretty straightforward.cat /proc/asound/modules
Change default sound card to USB audio
ALSA configuration file has moved (source) - edit to reflect a new default device setting. You want the default sound card to be the same number as your USB card (as found above). Open the following file:sudo nano /usr/share/alsa/alsa.confLook for the following two lines, and change the trailing 0's to match the number of your USB sound card.
defaults.ctl.card 1 # was 0
defaults.pcm.card 1 # was 0
Note: This is the same file previously located at/etc/modprobe.d/alsa-base.confFor some reason it was moved, and it's unfortunate that many of the tutorials dealing with USB audio are old enough to still refer to the old file location.
Allow USB audio to be set as default
Close that file. There is another file which overrides the /etc/modprobe.d/ files and sets all USB cards with a negative (never default) index. (source - not sure about the overwriting thing, though, as the /etc/-based file doesn't exist.) Open it and comment out the relevant line.sudo nano /lib/modprobe.d/aliases.conf
Comment out (put a # in front of) this line:options snd-usb-audio index=-2
The file does say it doesn't need to be modified, but that's to prevent "unusual" cards from being set as default - which is exactly what we want.Set USB as the default audio device
Open the user-specific alsa config file, back it up, and replace the contents. (source and source)cp ~/.asoundrc ~/.asoundrc.bak sudo nano ~/.asoundrcReplace everything in the file with the following. (alternative: use the existing format, and change the 0's to 1's)
pcm.!default plughw:Device
ctl.!default plughw:Device
pcm.!default {
type hw
card 1
}
ctl.!default {
type hw
card 1
}
The backup is located at ~/.asoundrc.bak. If you're curious about that weird plughw string, runaplay -Lto see a few examples of it. (this is a more complete version of the aplay -l command used above)
Test the results
A command to run a built-in sound file that tests the left and right channels. Disclaimer: it doesn't work for me. I had to create a very custom file with specific formats in order to get a result. (TODO later)
aplay /usr/share/sounds/alsa/Front_Center.wav
What didn't work
Tracking failures, for later.
Changing hardware defaults in aliases.conf
(source) Open and edit:sudo nano /lib/modprobe.d/aliases.conf
Change and edit these lines. One exists already, the second should be added directly below it.options snd-usb-audio index=0
options snd_bcm2835 index=1
That way, all the problems are short-circuited from the beginning.What happened: made the change, rebooted and ran aplay -l to see what was up: the USB device wasn't even listed. Apparently, that lower-level adjustment was not pleasing to the ALSA gods. A better magic is required.
Sources
I'm going to leave these here once I'm done as a permanent record in case anything goes wrong in the future. They're good sources.
- http://raspberrypi.stackexchange.com/questions/39928/unable-to-set-default-input-and-output-audio-device-on-raspberry-jessie
- http://raspberrypi.stackexchange.com/questions/19705/usb-card-as-my-default-audio-device
- http://raspberrypi.stackexchange.com/questions/36097/how-to-force-rpi-to-use-usb-soundcard
- https://www.raspberrypi.org/forums/viewtopic.php?t=20866
Extra Stuff
Other ways to check what audio devices are available
You can also use this to see what USB devices are connected:
lsusb
After running the following command, look for the entry that refers to your USB audio device.
aplay -lIn my case, the entry showed card 1, device 0, and only one subdevice, numbered 0 as the USB audio device I wanted. See below for my output. (This is a RPi Zero with a USB audio device connected)
**** List of PLAYBACK Hardware Devices ****
card 0: ALSA [bcm2835 ALSA], device 0: bcm2835 ALSA [bcm2835 ALSA]
Subdevices: 8/8
Subdevice #0: subdevice #0
Subdevice #1: subdevice #1
Subdevice #2: subdevice #2
Subdevice #3: subdevice #3
Subdevice #4: subdevice #4
Subdevice #5: subdevice #5
Subdevice #6: subdevice #6
Subdevice #7: subdevice #7
card 0: ALSA [bcm2835 ALSA], device 1: bcm2835 ALSA [bcm2835 IEC958/HDMI]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 1: EarMicrophone [USB Ear-Microphone], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0
And this one gives a more complete version than the lowercase option.
aplay -L
Speed up the Raspberry Pi
Normally I'd just bookmark this, but the link has already died once and I had to dig it up from alternate development trees. Current source: https://github.com/autostatic/scripts/blob/rpi/jackstart. And this is the rather fascinating page that lead me to all this.
(note: I'm not sure this is compatible with the latest Linux kernel, and it's not been tested. YMMV.)
(note: I'm not sure this is compatible with the latest Linux kernel, and it's not been tested. YMMV.)
#!/bin/bash ## Stop the ntp service sudo service ntp stop ## Stop the triggerhappy service sudo service triggerhappy stop ## Stop the dbus service. Warning: this can cause unpredictable behaviour when running a desktop environment on the RPi sudo service dbus stop ## Stop the console-kit-daemon service. Warning: this can cause unpredictable behaviour when running a desktop environment on the RPi sudo killall console-kit-daemon ## Stop the polkitd service. Warning: this can cause unpredictable behaviour when running a desktop environment on the RPi sudo killall polkitd ## Only needed when Jack2 is compiled with D-Bus support (Jack2 in the AutoStatic RPi audio repo is compiled without D-Bus support) #export DBUS_SESSION_BUS_ADDRESS=unix:path=/run/dbus/system_bus_socket ## Remount /dev/shm to prevent memory allocation errors sudo mount -o remount,size=128M /dev/shm ## Kill the usespace gnome virtual filesystem daemon. Warning: this can cause unpredictable behaviour when running a desktop environment on the RPi killall gvfsd ## Kill the userspace D-Bus daemon. Warning: this can cause unpredictable behaviour when running a desktop environment on the RPi killall dbus-daemon ## Kill the userspace dbus-launch daemon. Warning: this can cause unpredictable behaviour when running a desktop environment on the RPi killall dbus-launch ## Uncomment if you'd like to disable the network adapter completely #echo -n “1-1.1:1.0” | sudo tee /sys/bus/usb/drivers/smsc95xx/unbind ## In case the above line doesn't work try the following #echo -n “1-1.1” | sudo tee /sys/bus/usb/drivers/usb/unbind ## Set the CPU scaling governor to performance echo -n performance | sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor ## And finally start JACK jackd -P70 -p16 -t2000 -d alsa -dhw:UA25 -p 128 -n 3 -r 44100 -s & exitIf you don't mind killing the networking, you can add these to the script as well:
sudo service ifplugd stop sudo killall ifplugd sudo service networking stop
Running GUI programs over ssh command line
I've never run into this before, and it's really cool.
ssh -X pi@raspberrypiThis allows the remote computer to use the local computer's x server to run the gui for the remote program. Sooo, I can use Mathematica on my Mint 18 laptop through an ssh connection to my RPi 2. Very cool.
Monday, November 14, 2016
Setting up Raspberry Pi Zero in Headless Mode
Download the raspbian image, dd it to the sd card, unmount and remount it on the computer.
Figure out the name of your micro sd card (I'm going to assume that it's sdb) - look for the device with a total size similar to what yours is labeled as.
Extract the file (right-click and select either 'extract' or 'open with archive manager') and put it in the Downloads folder. If you rename it to raspbian.img after extracting, then you can cut and paste the following command. Note that you should change 'foo' to the device name you identified above, e.g., of=/dev/sdb. I'm keeping it as of=/dev/foo to prevent accidents.
Resize the partition in gparted if you're planning on doing anything that involves a significant amount of storage. The Raspbian image doesn't leave a lot of free space, and your card probably has extra room on it. Install gparted below, then run and resize (that's a GUI operation, so no walkthrough - sorry. Also, I've been doing from nearly the beginning of my Linux career and don't need any reminders on how to do it).
Unmount, stick the ssd in the pi, connect a wifi adapter and plug in the power. Most of the time you don't even need to know the IP of the pi, as you can use the computer name to ssh in:
Copy files with scp:
Figure out the name of your micro sd card (I'm going to assume that it's sdb) - look for the device with a total size similar to what yours is labeled as.
lsblkDownload the image here. Normally I'd give you a wget command, but it's a lot faster doing it on the browser. Grab the "Raspbian Jessie with Pixel" version...I don't trust the Lite to have everything needed for future projects.
Extract the file (right-click and select either 'extract' or 'open with archive manager') and put it in the Downloads folder. If you rename it to raspbian.img after extracting, then you can cut and paste the following command. Note that you should change 'foo' to the device name you identified above, e.g., of=/dev/sdb. I'm keeping it as of=/dev/foo to prevent accidents.
sudo dd bs=4M if=/home/$USER/Downloads/raspbian.img of=/dev/foo status=progressRemove the sd card from the computer and plug it back in. That seems to allow the low-level processes to reassess the contents of the card after everything on it was changed.
Resize the partition in gparted if you're planning on doing anything that involves a significant amount of storage. The Raspbian image doesn't leave a lot of free space, and your card probably has extra room on it. Install gparted below, then run and resize (that's a GUI operation, so no walkthrough - sorry. Also, I've been doing from nearly the beginning of my Linux career and don't need any reminders on how to do it).
sudo apt-get install gpartedSet up the networking - you'll need to tell the pi what the wifi password is (Source). cd into the primary partition of the sd card - you'll find it as the folder named with a long string of meaningless text if you do:
ls /media/$USERcd into that folder (represented by me here with x's and dashes) and then go several folders deeper to edit the wifi config file. The whole thing is done below.
sudo nano /media/$USER/xxxx-xxxx-xxxxx/etc/wpa_supplicant/wpa_supplicant.confAdd this to the end of the file, where "foo" is the name of your wifi network (literally, what shows up when you're choosing a network to connect to - it is case-sensitive) and "bar" is the wifi password. Both should be in quotation marks.
network={ ssid="foo" psk="bar" proto=RSN key_mgmt=WPA-PSK pairwise=CCMP auth_alg=OPEN }Save and exit by pressing CTRL-O, ENTER and CTRL-X.
Unmount, stick the ssd in the pi, connect a wifi adapter and plug in the power. Most of the time you don't even need to know the IP of the pi, as you can use the computer name to ssh in:
ssh pi@raspberrypiNote the password is
raspberry
A few extra commands to be aware of:
Note that to ssh into a Mint 18 distro from the pi, you need to install openssh-server on Mint first.sudo apt-get install openssh-serverIf you know the user and computer names that you want to ssh into, you can use them instead of hard-to-keep-track-of IP addresses. If everything is on the local network (LAN).
Copy files with scp:
scp user@computer:desired.file ~/path/to/containing/folderAnd on an *ahem* completely unrelated note, found a few more dependencies that sphinxtrain needed in the install script. I'll update vmc when I get a chance.
Wednesday, November 9, 2016
Basic filesharing server on OpenBSD (updated for 6.0)
For the record. I assume that the other OpenBSD pages on this site have already been implemented.
Install samba:
Cut & paste stuff doesn't really work on the command line, so here's what needs to be added at the bottom of the /etc/samba/smb.conf. src.
Install samba:
pkg_add sambaCreate a folder to use as a shared location. According to the official docs, that's what /srv is for. Since I use this a lot, I'm not going to bother with any further file structure.
mkdir -p /srv/ chmod 777 /srv/Modify the samba configuration file for a basic all-permissive shared folder. My threat model assumes that if someone is connected to the network, they're friendly. Since there's several Win7 computers on the network, I had to get around the authentication requests.
Cut & paste stuff doesn't really work on the command line, so here's what needs to be added at the bottom of the /etc/samba/smb.conf. src.
nano /etc/samba/smb.conf
[SRV] path = /srv/ public = yes only guest = yes writable = yes printable = no guest ok = yes read only = no map to guest = bad userAfter finishing the smb.conf edits, use the rc.conf.local file to start the samba share with the machine. You can use the echo command to make the adjustments. Note: rc.conf.local doesn't actually exist in the default OpenBSD...but it is acknowledged. src.
echo ' smbd_flags="-D" nmbd_flags="-D" ' >> /etc/rc.conf.local
rcctl enable samba rcctl restart sambaThat should do it.
Open a USB drive on OpenBSD
For the record.
Find out the device name of the usb with (e.g., sd1):
Find out the device name of the usb with (e.g., sd1):
sysctl hw.disknamesThen find out the name of the partition you want to mount with (e.g., i):
disklabel sd0Then create a mount point (a folder that is linked to the partition you're mounting):
mkdir /mnt/fooThen mount the partition at that location (e.g., sd1i):
mount /dev/sd1i /mnt/fooTo remove your flash drive, run:
umount /mnt/foo
Setting up OpenBSD (updated for 6.0)
A few notes on how I set up my OpenBSD installation. This will be an ongoing compilation.
and http://www.openbsd.org/faq/faq6.html#DHCP.
which destroys flexibility when I upgrade to 6.0 (whenever that comes out). Changed for compatibility with 6.0.
Installation
Had to do this with a USB connected CD drive. Followed the instructions for a flash drive, but the installation itself didn't want to play ball. I forget the exact scenario; it was confusing.
Wireless
I'm on an Acer Aspire One from a long time ago - I believe it's a D250 model. It uses the athn0 wifi driver. Set it up by putting this into your /etc/hostname.ath0 file (copy the whole thing into the command prompt and run it):
echo " nwid 'foo' wpakey 'bar' dhcp " > /etc/hostname.ath0
Replace the text as required with your own information...specifically, the stuff that says foo and bar. :) After that's added, run:
sh /etc/netstart
...because it won't start automatically. Don't know why. I used this link to figure out how to get it running.
Ethernet
If you have ethernet access, internet is somewhat simpler. Find out what the ethernet device name is:ifconfigMine is fxp0. Using DHCP makes things easy. All this command does is put dhcp in the device's config file.
echo dhcp > /etc/hostname.fxp0Reboot, and you should be online. Any problems, visit http://www.openbsd.org/faq/faq6.html#Setup
and http://www.openbsd.org/faq/faq6.html#DHCP.
Package Installation
See http://www.openbsd.org/faq/faq15.html#Intro for an excellent explanation of how all this works.
Setting up the Package Mirror
Being able to install packages is always nice. On OpenBSD, you have to specify the mirror you want to search from and download from manually. You can set this variable after startup every time, or put it in your .profile. I used the MIT mirror; it's not going anywhere anytime soon. [edit: ok, it did go. They didn't keep the 5.9 mirror once 6.0 came out; here's the link to the 6.0 packages.]vi ./.profileNow add (I stuck it in the middle of the file):
export PKG_PATH=http://mirrors.mit.edu/pub/OpenBSD/$(uname -r)/packages/$(uname -m)/Except that in my case this didn't work. OpenBSD read that as
mirrors.mit.edu/pub/OpenBSD/OpenBSD/packages/i386/which doesn't make sense. Instead, I had to do
export PKG_PATH=http://mirrors.mit.edu/pub/OpenBSD/6.0/packages/$(uname -m)/
Installing a Package
Now I can dopkg_add python-2.7.11to install python 2.7 - but to get that full package name, I have to do CTRL-F in the mirror webpage and figure out what's available. I'm pretty sure there's a way to search that on the command line, but I haven't figured it out yet. If only the package name, and not the exact version number, is known, then just use that. The following successfully installs nano.
pkg_add nano
Turning the Computer Off
Restarting
Restarting is simple:
reboot
Shutting Down
You'd think this would be simple, eh? Linux works with a straightforward
shutdown now
but that eventually brings you right back to the shell on my computer's OpenBSD installation. I have to use
halt
Though, and I haven't tried this yet, something like
shutdown -h now
Tuesday, October 11, 2016
Building a Statistical Language Model
Update: I finished my script for creating custom language models. See here: https://github.com/umhau/vmc.
There's a summary at the end with what I figured out. Most of this is me thinking on paper.
The statistical language model is used for helping CMU Sphinx know what words exist, and what the order the words exist in (the grammar and syntax structure). The intro website to all this is here.
I'm trying to decide between the SRILM and the MITLM packages [subsequent edit: also the logios package and the quicklm pearl script - these are referenced in hard-to-find places on the CMU website; see here and here, respectively] [another subsequent edit: looks like I found a link to the official CMU Statistical Language Model toolkit - it was buried in the QuickLM script]. S- is easier to use, apparently, and the CMU site provides example commands. M-, however, seems more likely to stick around and be accessible on github for the long-term. Plus, I forked it.
[sorry, blogger's formatting broke and I had to convert everything to plaintext and start over...lost the links.]
Only downside is, the main contributor to MITLM stopped work on it about 6 mos ago, and started dealing with Kaldi instead. Guess he figured the newer tech was more worth his time. Still, dinosaurs have their place; just watch Space Cowboys to get the picture.
Update: Thanks to Qi Wang's comment below there's an extra dependency to install:
So, turns out that there's some weird problems with the installation. Something changed, or something isn't being installed properly. The compilation seems to fail with these errors:
g++ wasn't installed, but even after that was added it still wouldn't work.
Update: Unfortunately, I've lost track of other dependencies involved - at some point, I'll make a list of all the stuff I've installed while working on this project. Had to install libtool (or similar?) to get here. Mental note:
After installing the extra dependency, the installation works! So this is a viable avenue thus far to get the LM working. I've already made it past where I need the MITLM, though, so I'm going to let it be for now. Might have to come back for it.
...unless I put the code on my github. In which case, it's easy to get a copy of. Too bad there's too many files to put up an extracted version, and too bad the compressed version is more than 25mb. Time to split up the tar.gz file again; for my own records, here's how I split it. All I need for getting and using it is the reconstruction bit.
The splitting part, given the archive file:
Installing John Ousterhout's TCL toolkit - we're past the required v7.3, and up to 8.6: hope this still works. I'm compiling from source rather than using the available binaries 'cause they come with some kind of non-commercial/education license, which I don't like being tied down by.
This is going nowhere fast. We're in dependency hell. Let's try the perl script CMU uses (it's the backend to the online service they officially reference).
...
Ok, solved the problem. Thank goodness for auto highlighting in Gedit. The authors used some kind of weird system for comments that I'm guessing was retired since this script was written. It seems to have been throwing the compiler for a loop:
Here's what to do:
Notes: I think the word list option in the command refers to the possibility of a limited vocabulary...not sure how that relates to words outside that list used in the sentence list. The discount in the command, however, is fixed at 0.5. Apparently Greg and Ben did some experiments to discover that's definitely the optimal setting.
Second Note: based on readings from the CMU website, this LM isn't good for much more than command-and-control - it can successfully detect short phrases accurately, but not long, drawn-out sentences. So it'll be good for most of what I want, but anything complex will need to be done with the CMULMTK package.
Hold on - the [-w <word_file>] option for a dictionary might be a request for output - not an extra input. And given that I do need an explicit dictionary for transcription, that's probably what it does. That would be wonderful. I can even use that sentence list for voice training - which would be a fabulous way to ensure accuracy.
Unfortunately, that's not the case. Oh, well.
Time to dive into the html documentation, and figure out usage. The goal is to create the LM and DIC files - and a nice perk would be the other stuff produced by the online LM generator.
Turns out, there doesn't seem to be any kind of pronunciation dictionary produced by this tool. So it's no good.
CMU, get your act together! The website is nice, but I've got no recourse if it goes down. I want an independent system!
Here goes. Goal: LM and DIC files. Starting point: list of sentences.
Download the package. Even this isn't user-friendly - the folder structure is in html. I used wget recursively to download the webpages. See here for source on the command.
There's a summary at the end with what I figured out. Most of this is me thinking on paper.
The statistical language model is used for helping CMU Sphinx know what words exist, and what the order the words exist in (the grammar and syntax structure). The intro website to all this is here.
I'm trying to decide between the SRILM and the MITLM packages [subsequent edit: also the logios package and the quicklm pearl script - these are referenced in hard-to-find places on the CMU website; see here and here, respectively] [another subsequent edit: looks like I found a link to the official CMU Statistical Language Model toolkit - it was buried in the QuickLM script]. S- is easier to use, apparently, and the CMU site provides example commands. M-, however, seems more likely to stick around and be accessible on github for the long-term. Plus, I forked it.
[sorry, blogger's formatting broke and I had to convert everything to plaintext and start over...lost the links.]
Only downside is, the main contributor to MITLM stopped work on it about 6 mos ago, and started dealing with Kaldi instead. Guess he figured the newer tech was more worth his time. Still, dinosaurs have their place; just watch Space Cowboys to get the picture.
MITLM
Just to be sure that the software doesn't go anywhere, code is downloaded from my repository.Update: Thanks to Qi Wang's comment below there's an extra dependency to install:
sudo apt-get install autoconf-archive
Installation of MITLM:cd ~/tools
git clone https://github.com/umhau/mitlm.git
cd ./mitlm
./autogen.sh
./configure
make
make install
./configure: line 19641: AX_CXX_HEADER_TR1_UNORDERED_MAP: command not found
./configure: line 19642: syntax error near unexpected token `noext,'
./configure: line 19642: `AX_CXX_COMPILE_STDCXX_11(noext, optional)'
Update: Unfortunately, I've lost track of other dependencies involved - at some point, I'll make a list of all the stuff I've installed while working on this project. Had to install libtool (or similar?) to get here. Mental note:
libtoolize: error: Failed to create 'build-aux'
But, that's because I'm trying to do this on a different Mint installation from my usual - on my default workstation, that dependency is installed (no idea what it is, except that it's probably listed somewhere on this blog). SRILM
Ok, let's see what SRILM has to offer us. It's more inconvenient to install; ya have to go through a license agreement to download it, so I can't just stick a bash command here....unless I put the code on my github. In which case, it's easy to get a copy of. Too bad there's too many files to put up an extracted version, and too bad the compressed version is more than 25mb. Time to split up the tar.gz file again; for my own records, here's how I split it. All I need for getting and using it is the reconstruction bit.
The splitting part, given the archive file:
split -b 24m -d srilm-1.7.1.tar.gz srilm-1.7.1.tar.gz.part-
Alright. Once the file is on github, it's just more copy-pasting. cd ~/tools
git clone https://github.com/umhau/srilm.git
cd ./srilm
cat srilm-1.7.1.tar.gz.part-* | tar -xz
By the way, WOW. The installation process for this software is not straightforward. See the install file for the instructions on installation - read for background, then copy-paste below as usual. gedit ./INSTALL
Step 2 - swap out the SRILM variable for one delimiting the root directory of the package. Source.sed -i '7s#.*#SRILM = ~/tools/srilm#' ./Makefile
For now, assuming that the variables are all good. I don't know if I want maximum entropy models, though it sounds useful...I'll see what happens if I don't prep them. Installing John Ousterhout's TCL toolkit - we're past the required v7.3, and up to 8.6: hope this still works. I'm compiling from source rather than using the available binaries 'cause they come with some kind of non-commercial/education license, which I don't like being tied down by.
cd ~/tools
git clone https://github.com/umhau/tcl-tk.git
cd ./tcl-tk
gunzip < tcl8.6.6-src.tar.gz | tar xvf -
gunzip < tk8.6.6-src.tar.gz | tar xvf -
Install TCL:cd tcl8.6.6/unix
# chmod +x configure
configure --enable-threads
make -j 3
make test
sudo make -j 3 install
Let's try running the rest without the TK stuff...even though John says it's needed. Heh. Leeeroooy Jenkins!cd ../../../srilm
make World
...aaaaaaaand, Fail. This is going nowhere fast. We're in dependency hell. Let's try the perl script CMU uses (it's the backend to the online service they officially reference).
The Perl Script
Thankfully, Mint comes with perl installed. So, the question is how to use the script.cd ~/tools
mkdir ./CMU_LMtool && cd ./CMU_LMtool
wget http://www.speech.cs.cmu.edu/tools/download/quick_lm.pl
The only thing left here is to figure out how to use the script...having never used perl, this could be interesting. Dug this nugget out of the script:usage: quick_lm -s <sentence_file> [-w <word_file>] [-d discount]
So, the idea with the LMtool is to process sentences that the decoder should recognize - it doesn't need to be an exhaustive list, however, because the decoder will allow fragments to recombine in the detection phase. As a corpus example (from the CMU website), here's the following:THIS IS AN EXAMPLE SENTENCE
EACH LINE IS SOMETHING THAT YOU'D WANT YOUR SYSTEM TO RECOGNIZE
ACRONYMS PRONOUNCED AS LETTERS ARE BEST ENTERED AS A T_L_A
NUMBERS AND ABBREVIATIONS OUGHT TO BE SPELLED OUT FOR EXAMPLE
TWO HUNDRED SIXTY THREE ET CETERA
YOU CAN UPLOAD A FEW THOUSAND SENTENCES
BUT THERE IS A LIMIT
We'll use this sentence collection to test the perl script: cd ~/tools/CMU_LMtool
wget https://raw.githubusercontent.com/umhau/misc-LMtools/master/ex-corpus.txt
perl quick_lm.pl -s ex-corpus.txt
Well, it did exactly nothing. No terminal output, no new files created in the directory, and no errors. Time to search the script for other possible output locations. How weird can it be?...
Ok, solved the problem. Thank goodness for auto highlighting in Gedit. The authors used some kind of weird system for comments that I'm guessing was retired since this script was written. It seems to have been throwing the compiler for a loop:
=POD
/*
[some text wrapped by those comment markers]
*/
[more text, only wrapped by the '=' things]
=END
So, I re-commented all the introductory stuff, and put the fixed version in the github repo.Summary of the Perl script
So, here's how it works: download the fixed script, give it a sentence list, and run the command. Simple. And, looking at the output, the function it performs is pretty simple too. Makes a list of all the 1, 2 and 3 - word groupings in the list.Here's what to do:
mkdir ~/tools/CMU_LMtool && cd ~/tools/CMU_LMtool
wget https://raw.githubusercontent.com/umhau/misc-LMtools/master/ex-corpus.txt
wget https://raw.githubusercontent.com/umhau/misc-LMtools/master/quick_lm.pl
perl quick_lm.pl -s ex-corpus.txt
Still not sure what that does for me, but I have my LM!Notes: I think the word list option in the command refers to the possibility of a limited vocabulary...not sure how that relates to words outside that list used in the sentence list. The discount in the command, however, is fixed at 0.5. Apparently Greg and Ben did some experiments to discover that's definitely the optimal setting.
Second Note: based on readings from the CMU website, this LM isn't good for much more than command-and-control - it can successfully detect short phrases accurately, but not long, drawn-out sentences. So it'll be good for most of what I want, but anything complex will need to be done with the CMULMTK package.
Hold on - the [-w <word_file>] option for a dictionary might be a request for output - not an extra input. And given that I do need an explicit dictionary for transcription, that's probably what it does. That would be wonderful. I can even use that sentence list for voice training - which would be a fabulous way to ensure accuracy.
Unfortunately, that's not the case. Oh, well.
The official CMU Statistical Language Model toolkit
Ok, maybe this'll do it for me. Here's the link to the source. The Perl script doesn't make all the different files I need - especially the pronunciation dictionary.mkdir ./tools/CMUSLM
cd ./tools/CMUSLM
wget http://www.speech.cs.cmu.edu/SLM/CMU-Cam_Toolkit_v2.tar.gz
gunzip < CMU-Cam_Toolkit_v2.tar.gz | tar xv
cd ./CMU-Cam_Toolkit_v2
Wow, this is old. You have to uncomment something if your computer isn't running HP-UX, IRIX, SunOS, or Solaris. I'm pretty sure anything build in this decade needs uncomment, but if you're unsure the README mentions a script you can run to check for yourself:bash endian.sh
Ok, uncomment:sed -i '37s/#//' ./src/Makefile
cd src
make install
Hard to tell if this was successful. I get the impression watching this compile that it was written in the 80s, and updated for compatibility with something advertising a max capacity of 512 Mb of random access memory. Time to dive into the html documentation, and figure out usage. The goal is to create the LM and DIC files - and a nice perk would be the other stuff produced by the online LM generator.
Turns out, there doesn't seem to be any kind of pronunciation dictionary produced by this tool. So it's no good.
The Logios Package
This seems to be the tool CMU claims was actually used in their website - and, indeed, some of their tools within the package are designed for use in a webform. So I might be on the right track. The only problem is, the input is not a list of sentences: it's a grammar file built by the Phoenix tool. No idea what that is or how it works.CMU, get your act together! The website is nice, but I've got no recourse if it goes down. I want an independent system!
Here goes. Goal: LM and DIC files. Starting point: list of sentences.
Download the package. Even this isn't user-friendly - the folder structure is in html. I used wget recursively to download the webpages. See here for source on the command.
CMUDict
Actually, it seems like I could just use the dictionary directly. The whole problem is one of how to get the entries from this file into a subset file that holds just what I want - so I'll just write a small script to do just that. What a pain.
wget http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/sphinxdict/cmudict_SPHINX_40
I'll post the script soon - it's being added to a larger package that should make the process of getting a personal language model pretty painless. That'd be nice. Saturday, October 8, 2016
Compressing and splitting folder archives
This isn't exactly about tar.gz files in particular - more like what to do when they're too big to upload into github...i.e., this is how to split them into pieces.
Compress the folder into a gzip archive (all you need when compressing a single folder):
I found the answer here.
Compress the folder into a gzip archive (all you need when compressing a single folder):
gzip -c my_large_file | split -b 1024MiB - myfile_split.gz_If the folder has already been compressed, here's the command for splitting it:
split -b 1024m "file.tar.gz" "file.tar.gz.part-"Apparently, recombining is a good use for the cat command. Pipe that output to gunzip and you won't have to create an intermediary archive file before decompression.
cat myfile_split.gz_* | gunzip -c > my_large_fileBasically, by using the pipes you avoid ever letting the unsplit archive sit on your drive.
I found the answer here.
Wednesday, September 7, 2016
[shameless copy] Offline Language Model Creation for PocketSphinx
Normally, I'd be writing these myself. But this time, the explanation was so unusually good that I don't feel the need to simplify it. It's fantastic for my purposes as-is. Source.
The purpose here is to create the statistical language model that pocketsphinx uses to convert phonetics into words. The model is based entirely on what type of sentences it expects to encounter, as defined by the input reference text.
I need this running as a self-contained script in order to make language model generation a seamless part of my project. All the user should have to do is provide a ready-made reference text, and the script should generate the rest.
The purpose here is to create the statistical language model that pocketsphinx uses to convert phonetics into words. The model is based entirely on what type of sentences it expects to encounter, as defined by the input reference text.
I need this running as a self-contained script in order to make language model generation a seamless part of my project. All the user should have to do is provide a ready-made reference text, and the script should generate the rest.
ARPA model training with CMUCLMTK
You need to download and install cmuclmtk. See CMU Sphinx Downloads for details.
The process for creating a language model is as follows:
1) Prepare a reference text that will be used to generate the language model. The language model toolkit expects its input to be in the form of normalized text files, with utterances delimited by
<s>
and </s>
tags. A number of input filters are available for specific corpora such as Switchboard, ISL and NIST meetings, and HUB5 transcripts. The result should be the set of sentences that are bounded by the start and end sentence markers: <s> and </s>. Here's an example:<s> generally cloudy today with scattered outbreaks of rain and drizzle persistent and heavy at times </s> <s> some dry intervals also with hazy sunshine especially in eastern parts in the morning </s> <s> highest temperatures nine to thirteen Celsius in a light or moderate mainly east south east breeze </s> <s> cloudy damp and misty today with spells of rain and drizzle in most places much of this rain will be light and patchy but heavier rain may develop in the west later </s>
More data will generate better language models. The
weather.txt
file from sphinx4 (used to generate the weather language model) contains nearly 100,000 sentences.
2) Generate the vocabulary file. This is a list of all the words in the file:
text2wfreq < weather.txt | wfreq2vocab > weather.tmp.vocab
3) You may want to edit the vocabulary file to remove words (numbers, misspellings, names). If you find misspellings, it is a good idea to fix them in the input transcript.
4) If you want a closed vocabulary language model (a language model that has no provisions for unknown words), then you should remove sentences from your input transcript that contain words that are not in your vocabulary file.
5) Generate the arpa format language model with the commands:
% text2idngram -vocab weather.vocab -idngram weather.idngram < weather.closed.txt % idngram2lm -vocab_type 0 -idngram weather.idngram -vocab \ weather.vocab -arpa weather.lm
6) Generate the CMU binary form (BIN)
sphinx_lm_convert -i weather.lm -o weather.lm.bin
The CMUCLTK tools and commands are documented at The CMU-Cambridge Language Modeling Toolkit page.
Tuesday, August 9, 2016
Using PocketSphinx within Python Code
Here's the source for what I've been working on.
Looks like my installation records will have to be updated to account for a different installation source, and maybe a different version of the source code.
Ok, here's the process so far. Install sphinxbase and pocketsphinx from GitHub - this means using the bleeding-edge versions, rather than the tried-and true alpha5 versions that I talked about in previous posts. This just seems to work better. Once this is all figured out, I'll go back and clean those up.
I'm going to assume that you've already created your own voice model based on the other posts in this blog, and that you've got a directory dedicated to command and control experiments.
If that's not true, then just mess with the script without moving it. Just make a backup. The only effective difference is that the detection will be less accurate; for the purposes of this tutorial, ignore the rest of the code down to where I've pasted my copy of the python script. The only thing you should change has to do with reading from the microphone rather than an audio file; change the script to match what I've got here. You're done now. The rest of this tutorial is for those who have already created their own voice model. See others of my posts for how to do that.
Note that the keyphrase it's looking for is the word 'and'. Pretty simple, and very likely to have been covered a lot in the voice training.
Note also that there's a weird quirk in the detection - you have to speak quickly. I tried for a long time making long, sonorous 'aaaannnnddd' noises at my microphone, and it didn't pick up. Finally gave a short, staccato 'and' - it detected me right away. Did it five more times, and it picked me up each time. I don't see a way to get around that - I think it's built into the buffer, so it won't even hear the whole thing otherwise. Or maybe I just said 'and' in the training really fast each time, though I don't think that's likely.
Looks like my installation records will have to be updated to account for a different installation source, and maybe a different version of the source code.
Ok, here's the process so far. Install sphinxbase and pocketsphinx from GitHub - this means using the bleeding-edge versions, rather than the tried-and true alpha5 versions that I talked about in previous posts. This just seems to work better. Once this is all figured out, I'll go back and clean those up.
cd ~/tools
git clone https://github.com/cmusphinx/sphinxbase.git
cd ./sphinxbase
./autogen.sh
./configure
make
make check
make install
cd ~/tools
git clone https://github.com/cmusphinx/pocketsphinx.git
cd ./pocketsphinx
./autogen.sh
./configure
make clean all
make check
sudo make install
Now look inside the pocketsphinx directory:cd ~/tools/pocketsphinx/swig/python/test
There's a whole bunch of test scripts that walk you through the implementation of pocketsphinx in python. It's basically done for you. Check the one called kws-test.py -- that's the one that will wait to hear a keyword, run a command when it does, then resume listening. Perfect!I'm going to assume that you've already created your own voice model based on the other posts in this blog, and that you've got a directory dedicated to command and control experiments.
If that's not true, then just mess with the script without moving it. Just make a backup. The only effective difference is that the detection will be less accurate; for the purposes of this tutorial, ignore the rest of the code down to where I've pasted my copy of the python script. The only thing you should change has to do with reading from the microphone rather than an audio file; change the script to match what I've got here. You're done now. The rest of this tutorial is for those who have already created their own voice model. See others of my posts for how to do that.
# Open file to read the data
# stream = open(os.path.join(datadir, "test-file.wav"), "rb")
# Alternatively you can read from microphone
import pyaudio
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
stream.start_stream()
Ok. For the rest of us, let's get back to messing with this script. While still in the test directory,mkdir ~/tools/cc_ex
cp ./kws_test.py ~/tools/cc_ex/kws_test.py
cd ~/tools/cc_ex/
gedit kws_test.py
There's a few changes to make in the python script. Make sure the model directory has been adjusted. Also, the script by default is checking in a .raw audio file for the keyword: uncomment and comment the relevant lines so the script uses pyaudio to record from the microphone. The full text of my version of the script is below. Note that the keyphrase it's looking for is the word 'and'. Pretty simple, and very likely to have been covered a lot in the voice training.
Note also that there's a weird quirk in the detection - you have to speak quickly. I tried for a long time making long, sonorous 'aaaannnnddd' noises at my microphone, and it didn't pick up. Finally gave a short, staccato 'and' - it detected me right away. Did it five more times, and it picked me up each time. I don't see a way to get around that - I think it's built into the buffer, so it won't even hear the whole thing otherwise. Or maybe I just said 'and' in the training really fast each time, though I don't think that's likely.
#!/usr/bin/python
import sys, os
from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *
modeldir = "~/tools/train-voice-data-pocketsphinx"
# Create a decoder with certain model
config = Decoder.default_config()
config.set_string('-hmm', os.path.join(modeldir, 'neo-en/en-us'))
config.set_string('-dict', os.path.join(modeldir, 'neo-en/cmudict-en-us.dict'))
config.set_string('-keyphrase', 'and')
config.set_float('-kws_threshold', 1e+1)
#config.set_string('-logfn', '/dev/null')
# Open file to read the data
# stream = open(os.path.join(datadir, "test-file.wav"), "rb")
# Alternatively you can read from microphone
import pyaudio
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
stream.start_stream()
# Process audio chunk by chunk. On keyphrase detected perform action and restart search
decoder = Decoder(config)
decoder.start_utt()
while True:
buf = stream.read(1024)
if buf:
decoder.process_raw(buf, False, False)
else:
break
if decoder.hyp() != None:
print ([(seg.word, seg.prob, seg.start_frame, seg.end_frame) for seg in decoder.seg()])
print ("Detected keyphrase, restarting search")
decoder.end_utt()
decoder.start_utt()
Anyway, that's all. If it doesn't work, don't blame me. That's as dead simple as I know how to make it.
Training a CMU Sphinx Language Model for Command and Control
CMU Sphinx is advanced enough to use its understanding of grammar to help it figure out the likelihood that a particular word was spoken. To do this, it needs to have a predefined concept of which words tend to follow each other -- it needs to understand the format of what is spoken to it. The context of a 'command and control' AI has a very specific type of grammar involved, where the format is predominately commands and statements.
If CMU Sphinx has been made to recognize that, it will be able to filter words that don't make sense in that context and weight more heavily words that do make sense as control words: it will know that 'play music' is more likely than 'pink music', and 'shutdown' is more likely to be a command than 'showdown'.
For now, here's the primary sources:
http://cmusphinx.sourceforge.net/wiki/tutoriallm
http://www.speech.cs.cmu.edu/tools/lmtool-new.html
Using this, it should be possible to create the grammar language model based on a big list of sentences; only problem is, I don't have a sentence list. Once that LM has been created, the voice data I've created should be retrained - even that is done based on grammar statistics.
If CMU Sphinx has been made to recognize that, it will be able to filter words that don't make sense in that context and weight more heavily words that do make sense as control words: it will know that 'play music' is more likely than 'pink music', and 'shutdown' is more likely to be a command than 'showdown'.
For now, here's the primary sources:
http://cmusphinx.sourceforge.net/wiki/tutoriallm
http://www.speech.cs.cmu.edu/tools/lmtool-new.html
Using this, it should be possible to create the grammar language model based on a big list of sentences; only problem is, I don't have a sentence list. Once that LM has been created, the voice data I've created should be retrained - even that is done based on grammar statistics.
Wednesday, August 3, 2016
Improving the Accuracy of CMU Sphinx for a Limited Vocabulary
Update: I finished my tool for creating a customized voice model. It encapsulates the best of what I described below. See here: https://github.com/umhau/vmc.
The idea with a limited vocabulary is that the processor can deal with far less information in order to detect the words needed. You don't have to train it on a complete set of words in the English language, and you don't need a supercomputer. All you have to do is teach it a few words, and how to spell them. The tutorial is here. I've created a script to automate the voice recording here, and stashed the needed files there with it.
Preparation
Alright, down to business. You'll find it handy to keep a folder for these sorts of programs.mkdir ~/tools cd ~/toolsInstall git, if you don't have it already.
sudo apt-get install gitDownload the script I made into your new tools folder.
sudo git clone https://github.com/umhau/train-voice-data-pocketsphinx.gitInstall SphinxTrain. I included it among the files you just downloaded. Move it up to ~/tools, extract and install it. It's also here, if you don't want to use the one I provided.
sudo mv ~/tools/train-voice-data-pocketsphinx/extra_files/sphinxtrain-5prealpha.tar.gz ~/tools sudo tar -xvzf ~/tools/sphinxtrain-5prealpha.tar.gz -C ~/tools cd sphinxtrain-5prealpha ./configure make -j 4 make -j 4 install
Record Your Voice
Enter this directory, run the script. It'll have a basic walkthrough built-in. This will help you record the data you need. For experimental purposes, 20 recordings is enough for about 10% relative improvement in accuracy. Use the name neo-en for your training data, assuming you're working in English.cd ./train-voice-data-pocketsphinx python train_voice_model.pyYou'll find your recordings in a subfolder with the same name as what you specified. Go there.
cd ./neo-enBy the way, if you ever change your mind about what you want your model to be named, there's a fantastic program called pyrenamer that can make it easy to rename all the files you created. Install it with:
sudo apt-get install pyrenamer
Process Your Voice Recordings
Great! Done with that part. Now we're going to copy some other directories into the current working directory to 'work on them'.cp -a /usr/local/share/pocketsphinx/model/en-us/en-us . cp -a /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict . cp -a /usr/local/share/pocketsphinx/model/en-us/en-us.lm.bin .Based on this source, it looks like we shouldn't be working with .dmp files. This is a point of deviation from the (outdated) CMU walkthrough. Copy the .bin file instead. Difference is explained below, sourced from the tutorial.
Language model can be stored and loaded in three different format - text ARPA format, binary format BIN and binary DMP format. ARPA format takes more space but it is possible to edit it. ARPA files have .lm extension. Binary format takes significantly less space and faster to load. Binary files have .lm.bin extension. It is also possible to convert between formats. DMP format is obsolete and not recommended.Now, while still in this directory, generate some 'acoustic feature files'.
sphinx_fe -argfile en-us/feat.params -samprate 16000 -c neo-en.fileids -di . -do . -ei wav -eo mfc -mswav yes
Get the Full-Sized Language Model
Nice. You have a bunch more files with weird extensions on them. Now it's time to convert them. You need the full version of the language model, which was not shared with your original installation for size reasons. I included it in the github repository, or you can download it from here (you want the file named cmusphinx-en-us-ptm-5.2.tar.gz). Put the extracted files in your neo-en directory.Assuming you use the one from the github repo and you're still in the neo-en subdirectory,
tar -xvzf ../extra_files/cmusphinx-en-us-ptm-5.2.tar.gz -C .There's an folder labeled en-us within the neo-en folder that was created when you made the acoustic feature files. Give it an extension and save it in case of horrible mistakes.
mv ./en-us ./en-us-originalNow move the newly extracted directory to your neo-en folder, and rename it to en-us.
mv ./cmusphinx-en-us-ptm-5.2 ./en-usThis converts the binary mdef file into a text file.
pocketsphinx_mdef_convert -text ./en-us/mdef ./en-us/mdef.txt
Grab Some Tools
Now you need some more tools to work with the data. These are from SphinxTrain, which you installed earlier. You should still be in your working directory, neo-en. Use ls to see what tools are available in the directory.ls /usr/local/libexec/sphinxtrain cp /usr/local/libexec/sphinxtrain/bw . cp /usr/local/libexec/sphinxtrain/map_adapt . cp /usr/local/libexec/sphinxtrain/mk_s2sendump . cp /usr/local/libexec/sphinxtrain/mllr_solve .
Run 'bw' Command to Collect Statistics on Your Voice
Now you're going to run a very long command that is designed to collect statistics about your voice. Those backslashes -- the \ things -- tell bash to ignore the following character: in this case, newline characters. That's how this command is stretching over multiple lines../bw \ -hmmdir en-us \ -moddeffn en-us/mdef.txt \ -ts2cbfn .ptm. \ -feat 1s_c_d_dd \ -svspec 0-12/13-25/26-38 \ -cmn current \ -agc none \ -dictfn cmudict-en-us.dict \ -ctlfn neo-en.fileids \ -lsnfn neo-en.transcription \ -accumdir .Future note, for using the continuous model instead of the PTM model (from the tutorial):
Make sure the arguments in bw command should match the parameters in feat.params file inside the acoustic model folder. Please note that not all the parameters from feat.param are supported by bw, only a few of them. bw for example doesn't suppport upperf or other feature extraction params. You only need to use parameters which are accepted, other parameters from feat.params should be skipped.
For example, for continuous model you don't need to include the svspec option. Instead, you need to use just -ts2cbfn .cont. For semi-continuous models use -ts2cbfn .semi. If model has `feature_transform` file like en-us continuous model, you need to add -lda feature_transform argument to bw, otherwise it will not work properly.
More Commands
Now it's time to adapt the model. Looks like continuous will be better to use in the long run, but first we're just going to get this working. The tutorial suggests that using MLLR and MAP adaptation methods together is best, but it looks like so far we're just using them sequentially. Here goes:./mllr_solve \ -meanfn en-us/means \ -varfn en-us/variances \ -outmllrfn mllr_matrix -accumdir .
It appears this adapted model is now completed! Nice work. To use it, add -mllr mllr_matrix to your PocketSphinx command line. I'll put complete commands at the bottom of this note.
cp -a en-us en-us-adaptTo run the MAP adaptation:
./map_adapt \ -moddeffn en-us/mdef.txt \ -ts2cbfn .ptm. \ -meanfn en-us/means \ -varfn en-us/variances \ -mixwfn en-us/mixture_weights \ -tmatfn en-us/transition_matrices \ -accumdir . \ -mapmeanfn en-us-adapt/means \ -mapvarfn en-us-adapt/variances \ -mapmixwfn en-us-adapt/mixture_weights \ -maptmatfn en-us-adapt/transition_matrices
[Optional; saves some space]
...I think. Apparently it's now important to recreate a sendump file from a newly updated mixture_weights file../mk_s2sendump \ -pocketsphinx yes \ -moddeffn en-us-adapt/mdef.txt \ -mixwfn en-us-adapt/mixture_weights \ -sendumpfn en-us-adapt/sendump
Testing the Model
It's also important to test the adaptation quality. This actually gives you a benchmark - a word error rate (WER). See here.Create Test Data
Use another script I made to record test data. It's almost the same, but the fileids and transcription file formats are different. The folder with the test data should end up in the neo-en directory. Use the directory name I provide, test-data.python ../create_test_records.pyRun the decoder on the test files. Go back into the neo-en folder.
pocketsphinx_batch \ -adcin yes \ -cepdir ./test-data \ -cepext .wav \ -ctl ./test-data/test-data.fileids \ -lm en-us.lm.bin \ -dict cmudict-en-us.dict \ -hmm en-us-adapt \ -hyp ./test-data/test-data.hypUse this tool to actually test the accuracy of the model. You'll need a working pocketsphinx installation, since it's just a wrapper with a word comparison engine over the transcription engine. Look at the end of the output; it'll give you some percentages indicating accuracy.
../../pocketsphinx-5prealpha/test/word_align.pl \ ./test-data/test-data.transcription \ ./test-data/test-data.hyp
Live Testing
If you just want to try out your new language model, record a file and try to transcribe it with these commands (assuming you're still in the neo-en working directory):python ../record_test_voice.py
pocketsphinx_continuous -hmm ./en-us-adapt -infile ../test-file.wav
Or, if you'd rather use a microphone and record live, use this command:pocketsphinx_continuous -hmm ./en-us-adapt -inmic yesWith 110 voice records and using 20 records as testing, I achieved 60% accuracy. At 400 records and a marginal mic, I achieved 77% accuracy. There's about 1000 records available.
Achieving Optimal Accuracy
You'll want to create your own language model if you're going to be using a specialized language. That's a pain, and you have to know what you're going to use it for ahead of time. If I do that, I'll collect the words from the tools where I specified them and automagically rebuild the language model. For now, I think I can get away with using the default lm.For actual use of the model, everything you need is in en-us-adapt. That's what you use when you need to refer in a command to your language-model.
Use the following command to transcribe a file, if you've created your own lm and language dictionary:
pocketsphinx_continuous -hmm <your_new_model_folder> \ -lm <your_lm> \ -dict <your_dict> \ -infile test.wavUpon testing it appears that a less controlled environment might be useful, as the transcription was almost perfect when I was able to recreate the atmosphere of the original training records and pretty bad otherwise.
Conclusions
I've made a bunch of scripts in the github repo that automate some of this stuff, assuming standard installs. Look for check_accuracy.sh and create_model.sh. Everything should be run inside the neo-en folder, except the original train_voice_model.py script.TODO next -
- set up the more accurate continuous model
- create a script that generates words in faux-sentences based on my use case scenario.
- find a phonetic dictionary that covers my needs
- figure out what my use case actually is
Tuesday, August 2, 2016
Setting Up an Offline Transcriber Using Kaldi - Part 3: Sphinx, not Kaldi
How to install PocketSphinx 5Prealpha on Mint 17.3.
We're going to install work with these packages in a folder located at ~/tools. Make sure this exists.
We're going to install work with these packages in a folder located at ~/tools. Make sure this exists.
mkdir ~/toolsDownload pocketsphinx and sphinxbase from the downloads page:
- Look for the package called sphinxbase-5prealpha.tar.gz. https://sourceforge.net/projects/cmusphinx/files/sphinxbase/5prealpha/
- Look for the package called pocketsphinx-5prealpha.tar.gz. https://sourceforge.net/projects/cmusphinx/files/pocketsphinx/5prealpha/
Move the files from your downloads to your project folder and extract them.
tar -xzf ~/Downloads/sphinxbase-5prealpha.tar.gz -C ~/tools/ tar -xzf ~/Downloads/pocketsphinx-5prealpha.tar.gz -C ~/tools/
Make sure dependencies are installed. You're installing libpulse-dev so that sphinxbase will configure itself to work with PulseAudio, the recommended audio framework on Ubuntu (and, by extension, on Mint).
sudo apt-get install python-dev pulseaudio libpulse-dev gcc automake autoconf libtool bison swig
Note: make sure that swig is at least version 2.0. You can check with this command:
dpkg -p swig | grep Version
Move into the sphinxbase folder.
cd ~/tools/sphinxbase-5prealpha
Since you downloaded the release version, the configure file has already been generated. It's time to configure, make and make install!
./configure make sudo make install
Sphinxbase is installed in /usr/local/lib; in case Mint 17 doesn't look there for program libraries, you have to manually tell it to use that location. Here's the commands:
export LD_LIBRARY_PATH=/usr/local/lib export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
Now move into the pocketsphinx folder and do the same installation:
cd ~/tools/pocketsphinx-5prealpha ./configure make sudo make install
you can test the installation by running the following; it should be recognizing what you speak into the microphone.
pocketsphinx_continuous -inmic yes
If you want to transcribe a file, use this command:
pocketsphinx_continuous -infile file.wavIf you run into trouble, this should help.
Sunday, July 31, 2016
Speech Recognition Final Verdict
Final strategy: I'm going to use CMU Sphinx with a small vocabulary trained to my voice for most commands. I'll use the kaldi-gstreamer-server, or maybe even an online service, for larger, arbitrary pieces of sound - stuff that I can't predict.
Which means that I'll have two separate, behemoth systems installed on the computer. Ouch. At least I can stream Kaldi from a different computer. Sphinx should be small enough to not be a problem.
Here's what I need to be able to train the command and control language model.
Which means that I'll have two separate, behemoth systems installed on the computer. Ouch. At least I can stream Kaldi from a different computer. Sphinx should be small enough to not be a problem.
Here's what I need to be able to train the command and control language model.
Final post here
I'm switching over to github pages . The continuation of this blog (with archives included) is at umhau.github.io . By the way, the ...
-
A beowulf cluster lets me tie miscellaneous computers together and use their cpus like one large processor...I think. Never done this befor...
-
I still use OpenBSD for my server, so here goes setting up an NFS on it. This is a fantastic resource. Note: if having trouble with the...
-
Notes on the process of installing Kaldi and Kaldi-GStreamer-server on Ubuntu 16.04 LTS. These were modified somewhat, since this is retr...