*nixing Around: package installation

Showing posts with label package installation. Show all posts

Friday, May 12, 2017

Installing GNU APL

Two parts: the keyboard layout and the program itself. This is on Ubuntu GNOME 17.04.

Install GNU APL.

cd ~/Downloads
wget ftp://ftp.gnu.org/gnu/apl/apl-1.7.tar.gz
tar xzf apl-1.7.tar.gz
cd apl-1.7
./configure
make 
sudo make install

Set up the keyboard (for all the weird symbols). On Ubuntu GNOME 17.04, go to:

settings -> region and language -> [+] input source -> English (region) -> APL (dyalog)

Then set a good fixed-width font so the symbols show up correctly. Go here and download the recommended font, open and install it. Then open the GNOME Tweak Tool:

fonts -> monospace -> APL385 Unicode Regular -> Select

And you're done. Use

win + space

to switch between the fonts.

Wednesday, November 9, 2016

Setting up OpenBSD (updated for 6.0)

A few notes on how I set up my OpenBSD installation. This will be an ongoing compilation.

Installation

Had to do this with a USB connected CD drive. Followed the instructions for a flash drive, but the installation itself didn't want to play ball. I forget the exact scenario; it was confusing.

Wireless

I'm on an Acer Aspire One from a long time ago - I believe it's a D250 model. It uses the athn0 wifi driver. Set it up by putting this into your /etc/hostname.ath0 file (copy the whole thing into the command prompt and run it):

echo "
nwid 'foo'
wpakey 'bar'
dhcp 
" > /etc/hostname.ath0

Replace the text as required with your own information...specifically, the stuff that says foo and bar. :) After that's added, run:

sh /etc/netstart

...because it won't start automatically. Don't know why. I used this link to figure out how to get it running.

Ethernet

If you have ethernet access, internet is somewhat simpler. Find out what the ethernet device name is:

ifconfig

Mine is fxp0. Using DHCP makes things easy. All this command does is put dhcp in the device's config file.

echo dhcp > /etc/hostname.fxp0

Reboot, and you should be online. Any problems, visit http://www.openbsd.org/faq/faq6.html#Setup
and http://www.openbsd.org/faq/faq6.html#DHCP.

Package Installation

See http://www.openbsd.org/faq/faq15.html#Intro for an excellent explanation of how all this works.

Setting up the Package Mirror

Being able to install packages is always nice. On OpenBSD, you have to specify the mirror you want to search from and download from manually. You can set this variable after startup every time, or put it in your .profile. I used the MIT mirror; it's not going anywhere anytime soon. [edit: ok, it did go. They didn't keep the 5.9 mirror once 6.0 came out; here's the link to the 6.0 packages.]

vi ./.profile

Now add (I stuck it in the middle of the file):

export PKG_PATH=http://mirrors.mit.edu/pub/OpenBSD/$(uname -r)/packages/$(uname -m)/

Except that in my case this didn't work. OpenBSD read that as

mirrors.mit.edu/pub/OpenBSD/OpenBSD/packages/i386/

which doesn't make sense. Instead, I had to do

export PKG_PATH=http://mirrors.mit.edu/pub/OpenBSD/6.0/packages/$(uname -m)/

~~which destroys flexibility when I upgrade to 6.0 (whenever that comes out).~~ Changed for compatibility with 6.0.

Installing a Package

Now I can do

pkg_add python-2.7.11

to install python 2.7 - but to get that full package name, I have to do CTRL-F in the mirror webpage and figure out what's available. I'm pretty sure there's a way to search that on the command line, but I haven't figured it out yet. If only the package name, and not the exact version number, is known, then just use that. The following successfully installs nano.

pkg_add nano

Turning the Computer Off

Restarting

Restarting is simple:

reboot

Shutting Down

You'd think this would be simple, eh? Linux works with a straightforward

shutdown now

but that eventually brings you right back to the shell on my computer's OpenBSD installation. I have to use

halt

Though, and I haven't tried this yet, something like

shutdown -h now

is also supposed to work. This thread has more details.

Tuesday, October 11, 2016

Building a Statistical Language Model

Update: I finished my script for creating custom language models. See here: https://github.com/umhau/vmc.

There's a summary at the end with what I figured out. Most of this is me thinking on paper.

The statistical language model is used for helping CMU Sphinx know what words exist, and what the order the words exist in (the grammar and syntax structure). The intro website to all this is here.

I'm trying to decide between the SRILM and the MITLM packages [subsequent edit: also the logios package and the quicklm pearl script - these are referenced in hard-to-find places on the CMU website; see here and here, respectively] [another subsequent edit: looks like I found a link to the official CMU Statistical Language Model toolkit - it was buried in the QuickLM script]. S- is easier to use, apparently, and the CMU site provides example commands. M-, however, seems more likely to stick around and be accessible on github for the long-term. Plus, I forked it.

[sorry, blogger's formatting broke and I had to convert everything to plaintext and start over...lost the links.]

Only downside is, the main contributor to MITLM stopped work on it about 6 mos ago, and started dealing with Kaldi instead. Guess he figured the newer tech was more worth his time. Still, dinosaurs have their place; just watch Space Cowboys to get the picture.

MITLM

Just to be sure that the software doesn't go anywhere, code is downloaded from my repository.

Update: Thanks to Qi Wang's comment below there's an extra dependency to install:

sudo apt-get install autoconf-archive

Installation of MITLM:

cd ~/tools
git clone https://github.com/umhau/mitlm.git
cd ./mitlm
./autogen.sh
./configure
make
make install

~~So, turns out that there's some weird problems with the installation. Something changed, or something isn't being installed properly. The compilation seems to fail with these errors:~~

./configure: line 19641: AX_CXX_HEADER_TR1_UNORDERED_MAP: command not found
./configure: line 19642: syntax error near unexpected token `noext,'
./configure: line 19642: `AX_CXX_COMPILE_STDCXX_11(noext, optional)'

~~g++ wasn't installed, but even after that was added it still wouldn't work.~~

Update: Unfortunately, I've lost track of other dependencies involved - at some point, I'll make a list of all the stuff I've installed while working on this project. Had to install libtool (or similar?) to get here. Mental note:

libtoolize:   error: Failed to create 'build-aux'

But, that's because I'm trying to do this on a different Mint installation from my usual - on my default workstation, that dependency is installed (no idea what it is, except that it's probably listed somewhere on this blog).

After installing the extra dependency, the installation works! So this is a viable avenue thus far to get the LM working. I've already made it past where I need the MITLM, though, so I'm going to let it be for now. Might have to come back for it.

SRILM

Ok, let's see what SRILM has to offer us. It's more inconvenient to install; ya have to go through a license agreement to download it, so I can't just stick a bash command here.

...unless I put the code on my github. In which case, it's easy to get a copy of. Too bad there's too many files to put up an extracted version, and too bad the compressed version is more than 25mb. Time to split up the tar.gz file again; for my own records, here's how I split it. All I need for getting and using it is the reconstruction bit.

The splitting part, given the archive file:

split -b 24m -d srilm-1.7.1.tar.gz srilm-1.7.1.tar.gz.part-

Alright. Once the file is on github, it's just more copy-pasting.

cd ~/tools
git clone https://github.com/umhau/srilm.git
cd ./srilm
cat srilm-1.7.1.tar.gz.part-* | tar -xz

By the way, WOW. The installation process for this software is not straightforward. See the install file for the instructions on installation - read for background, then copy-paste below as usual.

gedit ./INSTALL

Step 2 - swap out the SRILM variable for one delimiting the root directory of the package. Source.

sed -i '7s#.*#SRILM = ~/tools/srilm#' ./Makefile

For now, assuming that the variables are all good. I don't know if I want maximum entropy models, though it sounds useful...I'll see what happens if I don't prep them.

Installing John Ousterhout's TCL toolkit - we're past the required v7.3, and up to 8.6: hope this still works. I'm compiling from source rather than using the available binaries 'cause they come with some kind of non-commercial/education license, which I don't like being tied down by.

cd ~/tools
git clone https://github.com/umhau/tcl-tk.git
cd ./tcl-tk
gunzip < tcl8.6.6-src.tar.gz | tar xvf -
gunzip < tk8.6.6-src.tar.gz | tar xvf -

Install TCL:

cd tcl8.6.6/unix
# chmod +x configure
configure --enable-threads
make -j 3
make test 
sudo make -j 3 install

Let's try running the rest without the TK stuff...even though John says it's needed. Heh. Leeeroooy Jenkins!

cd ../../../srilm
make World

...aaaaaaaand, Fail.

This is going nowhere fast. We're in dependency hell. Let's try the perl script CMU uses (it's the backend to the online service they officially reference).

The Perl Script

Thankfully, Mint comes with perl installed. So, the question is how to use the script.

cd ~/tools
mkdir ./CMU_LMtool && cd ./CMU_LMtool
wget http://www.speech.cs.cmu.edu/tools/download/quick_lm.pl

The only thing left here is to figure out how to use the script...having never used perl, this could be interesting. Dug this nugget out of the script:

usage: quick_lm -s <sentence_file> [-w <word_file>] [-d discount]

So, the idea with the LMtool is to process sentences that the decoder should recognize - it doesn't need to be an exhaustive list, however, because the decoder will allow fragments to recombine in the detection phase. As a corpus example (from the CMU website), here's the following:

THIS IS AN EXAMPLE SENTENCE
EACH LINE IS SOMETHING THAT YOU'D WANT YOUR SYSTEM TO RECOGNIZE
ACRONYMS PRONOUNCED AS LETTERS ARE BEST ENTERED AS A T_L_A
NUMBERS AND ABBREVIATIONS OUGHT TO BE SPELLED OUT FOR EXAMPLE
TWO HUNDRED SIXTY THREE ET CETERA
YOU CAN UPLOAD A FEW THOUSAND SENTENCES
BUT THERE IS A LIMIT

We'll use this sentence collection to test the perl script:

cd ~/tools/CMU_LMtool
wget https://raw.githubusercontent.com/umhau/misc-LMtools/master/ex-corpus.txt
perl quick_lm.pl -s ex-corpus.txt

Well, it did exactly nothing. No terminal output, no new files created in the directory, and no errors. Time to search the script for other possible output locations. How weird can it be?

...

Ok, solved the problem. Thank goodness for auto highlighting in Gedit. The authors used some kind of weird system for comments that I'm guessing was retired since this script was written. It seems to have been throwing the compiler for a loop:

=POD
/*
[some text wrapped by those comment markers]
*/
[more text, only wrapped by the '=' things]
=END

So, I re-commented all the introductory stuff, and put the fixed version in the github repo.

Summary of the Perl script

So, here's how it works: download the fixed script, give it a sentence list, and run the command. Simple. And, looking at the output, the function it performs is pretty simple too. Makes a list of all the 1, 2 and 3 - word groupings in the list.

Here's what to do:

mkdir ~/tools/CMU_LMtool && cd ~/tools/CMU_LMtool
wget https://raw.githubusercontent.com/umhau/misc-LMtools/master/ex-corpus.txt
wget https://raw.githubusercontent.com/umhau/misc-LMtools/master/quick_lm.pl
perl quick_lm.pl -s ex-corpus.txt

Still not sure what that does for me, but I have my LM!

Notes: I think the word list option in the command refers to the possibility of a limited vocabulary...not sure how that relates to words outside that list used in the sentence list. The discount in the command, however, is fixed at 0.5. Apparently Greg and Ben did some experiments to discover that's definitely the optimal setting.

Second Note: based on readings from the CMU website, this LM isn't good for much more than command-and-control - it can successfully detect short phrases accurately, but not long, drawn-out sentences. So it'll be good for most of what I want, but anything complex will need to be done with the CMULMTK package.

Hold on - the [-w <word_file>] option for a dictionary might be a request for output - not an extra input. And given that I do need an explicit dictionary for transcription, that's probably what it does. That would be wonderful. I can even use that sentence list for voice training - which would be a fabulous way to ensure accuracy.

Unfortunately, that's not the case. Oh, well.

The official CMU Statistical Language Model toolkit

Ok, maybe this'll do it for me. Here's the link to the source. The Perl script doesn't make all the different files I need - especially the pronunciation dictionary.

mkdir ./tools/CMUSLM
cd ./tools/CMUSLM
wget http://www.speech.cs.cmu.edu/SLM/CMU-Cam_Toolkit_v2.tar.gz
gunzip < CMU-Cam_Toolkit_v2.tar.gz | tar xv
cd ./CMU-Cam_Toolkit_v2

Wow, this is old. You have to uncomment something if your computer isn't running HP-UX, IRIX, SunOS, or Solaris. I'm pretty sure anything build in this decade needs uncomment, but if you're unsure the README mentions a script you can run to check for yourself:

bash endian.sh

Ok, uncomment:

sed -i '37s/#//' ./src/Makefile
cd src
make install

Hard to tell if this was successful. I get the impression watching this compile that it was written in the 80s, and updated for compatibility with something advertising a max capacity of 512 Mb of random access memory.

Time to dive into the html documentation, and figure out usage. The goal is to create the LM and DIC files - and a nice perk would be the other stuff produced by the online LM generator.

Turns out, there doesn't seem to be any kind of pronunciation dictionary produced by this tool. So it's no good.

The Logios Package

This seems to be the tool CMU claims was actually used in their website - and, indeed, some of their tools within the package are designed for use in a webform. So I might be on the right track. The only problem is, the input is not a list of sentences: it's a grammar file built by the Phoenix tool. No idea what that is or how it works.

CMU, get your act together! The website is nice, but I've got no recourse if it goes down. I want an independent system!

Here goes. Goal: LM and DIC files. Starting point: list of sentences.

Download the package. Even this isn't user-friendly - the folder structure is in html. I used wget recursively to download the webpages. See here for source on the command.

CMUDict

Actually, it seems like I could just use the dictionary directly. The whole problem is one of how to get the entries from this file into a subset file that holds just what I want - so I'll just write a small script to do just that. What a pain.

wget http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/sphinxdict/cmudict_SPHINX_40

I'll post the script soon - it's being added to a larger package that should make the process of getting a personal language model pretty painless. That'd be nice.

Tuesday, August 2, 2016

Setting Up an Offline Transcriber Using Kaldi - Part 3: Sphinx, not Kaldi

How to install PocketSphinx 5Prealpha on Mint 17.3.

We're going to install work with these packages in a folder located at ~/tools. Make sure this exists.

mkdir ~/tools

Download pocketsphinx and sphinxbase from the downloads page:

Look for the package called sphinxbase-5prealpha.tar.gz. https://sourceforge.net/projects/cmusphinx/files/sphinxbase/5prealpha/
Look for the package called pocketsphinx-5prealpha.tar.gz. https://sourceforge.net/projects/cmusphinx/files/pocketsphinx/5prealpha/

Move the files from your downloads to your project folder and extract them.

tar -xzf ~/Downloads/sphinxbase-5prealpha.tar.gz -C ~/tools/
tar -xzf ~/Downloads/pocketsphinx-5prealpha.tar.gz -C ~/tools/

Make sure dependencies are installed. You're installing libpulse-dev so that sphinxbase will configure itself to work with PulseAudio, the recommended audio framework on Ubuntu (and, by extension, on Mint).

sudo apt-get install python-dev pulseaudio libpulse-dev gcc automake autoconf libtool bison swig

Note: make sure that swig is at least version 2.0. You can check with this command:

dpkg -p swig | grep Version

Move into the sphinxbase folder.

cd ~/tools/sphinxbase-5prealpha

Since you downloaded the release version, the configure file has already been generated. It's time to configure, make and make install!

./configure
make
sudo make install

Sphinxbase is installed in /usr/local/lib; in case Mint 17 doesn't look there for program libraries, you have to manually tell it to use that location. Here's the commands:

export LD_LIBRARY_PATH=/usr/local/lib
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

Now move into the pocketsphinx folder and do the same installation:

cd ~/tools/pocketsphinx-5prealpha
./configure
make
sudo make install

you can test the installation by running the following; it should be recognizing what you speak into the microphone.

pocketsphinx_continuous -inmic yes

If you want to transcribe a file, use this command:

pocketsphinx_continuous -infile file.wav

If you run into trouble, this should help.

*nixing Around