*nixing Around: Linux

Showing posts with label Linux. Show all posts

Monday, June 5, 2017

Fast User Switching

This is for Ubuntu GNOME 17.04. Maybe a lot of other versions of Linux as well.

To get the main login menu: CTRL + ALT + F1
To switch to the first logged in user: CTRL + ALT + F2
To switch to the second logged in user: CTRL + ALT + F3
Etc.

Sound Device Chooser

Personal memo, because I need this so much. On Ubuntu GNOME 17.04, the OS fails to switch between sound devices when I plug in USB headphones. I always have to go into the sound menu in the settings and manually switch. At least this way, it's fewer clicks.

https://extensions.gnome.org/extension/906/sound-output-device-chooser/

Friday, May 12, 2017

Installing GNU APL

Two parts: the keyboard layout and the program itself. This is on Ubuntu GNOME 17.04.

Install GNU APL.

cd ~/Downloads
wget ftp://ftp.gnu.org/gnu/apl/apl-1.7.tar.gz
tar xzf apl-1.7.tar.gz
cd apl-1.7
./configure
make 
sudo make install

Set up the keyboard (for all the weird symbols). On Ubuntu GNOME 17.04, go to:

settings -> region and language -> [+] input source -> English (region) -> APL (dyalog)

Then set a good fixed-width font so the symbols show up correctly. Go here and download the recommended font, open and install it. Then open the GNOME Tweak Tool:

fonts -> monospace -> APL385 Unicode Regular -> Select

And you're done. Use

win + space

to switch between the fonts.

Saturday, May 6, 2017

Detaching modal dialog boxes from windows

This had been bugging me for a long time. Gnome Shell Ubuntu decided that it would be nice if users couldn't get to applications while dialog boxes (like a save menu) were open. Here's a fix to undo that decision. src. This is functional on 17.04.

Detach dialog

dconf write /org/gnome/shell/overrides/attach-modal-dialogs false

Attach dialog

dconf write /org/gnome/shell/overrides/attach-modal-dialogs true

Wednesday, April 12, 2017

Set External Monitor as Default in Debian Console

I have a copy of debian running on a busted ThinkPad without an internal monitor. It would be nice if the command line didn't revert to a 640x480 resolution on the external. Solution: completely disable the internal monitor, so linux auto-sets the monitor resolution according to the specs of the external monitor. src.

Find the name of your monitors. My internal card is an intel, so I can look in /sys for the EDID file (which has the EDID name, which is what we want). src.

find /sys -name edid

Based on the output of that command, the name of my internal display is

LVDS-1

With that information, I'm going into GRUB and disabling the display. Note it will not work at all after this, unless you change the setting back.

sudo nano /etc/default/grub

edit the line from

GRUB_CMDLINE_LINUX_DEFAULT="quiet"

(or whatever it was to begin with) to

GRUB_CMDLINE_LINUX_DEFAULT="quiet video=LVDS-1:d"

Keep whatever settings were already there. Update GRUB, and reboot the computer.

sudo update-grub
reboot

Saturday, February 18, 2017

tmux cheat sheet

A few commands that are useful to know. src.

managing sessions

tmux new -s foobar          | creates a new tmux session with given name foobar
tmux attach -t foobar       | attaches to an existing tmux session named foobar
tmux list-sessions          | list all available tmux sessions
tmux switch -t foobar       | switches to a session named foobar
tmux detach (ctrl + b, + d) | detach from the current session

managing windows

tmux new-window (ctrl + b, + c)     | create a new tmux window
tmux select-window -t :0-9 (ctrl + b, + 0-9) | choose an existing tmux window
tmux rename-window (ctrl + b, + ,)  | rename an existing tmux window

Wednesday, February 8, 2017

download changes from git

So this should work when I've got multiple copies of the repo on different computers. Run this (and the first two commands are optional) to refresh the local repo to the latest version.

git reset --hard HEAD
git clean -f

Those two will remove any local changes. The last one actually gets the new version.

git pull

Thursday, February 2, 2017

git cheat sheet

Still learning the system.

git init                |  Tell git to start watching the directory
git clone <repo>        |  Get a local copy of the repository

.gitignore              |  Contains patterns of files to ignore
git rm --cached <file>  |  Stop tracking the file in git

git add <file>          |  Stage a snapshot of the file
git rm <file>           |  Stage the file's removal
git mv <a> <b>          |  Rename 'a' to 'b' and stage the change

git status              |  Staging status of files
git status -s           |  Simpler version of status

git commit              |  Upload to server
git commit -m "txt"     |  Commit, with inline message

basic github - working with a cloned repository

How to use github properly. Just the basics: cloning a repo, making a change, and uploading the change back to github.

sources

https://git-scm.com/book/en/v2/Git-Basics-Getting-a-Git-Repository
https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository
https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes

how-to: summary

This is the super simple version with no commentary, and just a bit of explanatory text. Remember, the warnings come after the spells.

Getting set up

Install git

sudo apt-get install git

configure login email address

Set your github name.

git config --global user.name "YOUR NAME"

set your email address to private in github. Go there:

profile --> settings --> emails --> [] keep my email address private

Set the global github private email address.

git config --global user.email "username@users.noreply.github.com"

Double-check the email address.

git config --global user.email

connect to github with https

Hold onto the github password for a while.

git config --global credential.helper cache

Extend the password timeout period.

git config --global credential.helper 'cache --timeout=1800'

making changes to a current project

First, create a local clone of your fork: I'm using the MPI Torch project as an example.

mkdir ~/projects && cd ~/projects
git clone https://github.com/umhau/mpiT.git
cd mpiT

Create or modify a file in your local repository, then stage it.

echo "Extra! Extra! Read all about it!" >> README.md
touch laughing.txt && echo "hahahaha" > laughing.txt

git add README.md
git add laughing.txt

Monitor the staging process.

git status -s

Commit your staged files prior to uploading.

git commit -m "write a commit message here"

Check that 'origin' is the correct short name of the github server.

git remote -v

Push your commit to the github server

git push origin master

how-to: annotated version

Getting set up

Install git

sudo apt-get install git

configure login email address

Tell git your name - who to credit your work to.

git config --global user.name "YOUR NAME"

configure your email address - you can keep your real one private by combining your github username with a special github email domain. First, change github settings to keep the email address private. Go to:

profile --> settings --> emails --> [] keep my email address private

Now configure your email address. This tells git and github to use the private email address for all repositories downloaded to the computer.

git config --global user.email "username@users.noreply.github.com"

You can confirm the email address with this command:

git config --global user.email

connect to github

We're going to use HTTPS to download, and we're not going to deal with 2-factor authentication, and we are going to use a password manager to avoid repeatedly entering passwords.

This tells git to hold onto the github password for a while.

git config --global credential.helper cache

And this controls how long that 'while' is - default is 15 minutes, but I feel like 30 (it's counted in seconds).

git config --global credential.helper 'cache --timeout=1800'

making changes to a current project

We'll assume, for the moment, that you have an ongoing project - this is going to be either a copy ("fork") of someone else's project, or something you've already started uploading to github and want to keep working on The Right Way. Since this is your ongoing project, you own it and have full permissions to mess with it. You're also not going to be merging it with some other repository.

The idea behind the many steps involved with making changes (staging, the head, etc.), is to allow for both minor and major changes - a simple tweak by one guy vs. a complete overhaul vs. a new feature. The really big changes can be made in forks of the project that get merged with the original (upstream) version, the sizable changes can be made in branches of the current fork (or original repository), and the small changes can be verified by the rest of the team prior to being added directly to a branch (master/primary or otherwise) of the current repository.

It's a great system, but a real pity there's no simple way to get started.

First, create a local clone of your fork: I'm using the MPI Torch project as an example. Make sure you have a place to put it.

mkdir ~/projects && cd ~/projects

The .git link you need to download with is on github -- look over on the right where the opened menu is:

The link is over on the left. After you've got it, run the command below (or your equivalent, on a project you can make changes to).

git clone https://github.com/umhau/mpiT.git
cd mpiT

Now you've "created a local clone" of your repository/fork. Nice! You've got the code on your computer. I'm pretty sure that everything else you do with github related to that 'local clone' has to be done while you're 'within the directory' -cd'd inside mpiT (in this case).

As you change files, git will be watching - anything you change will be marked modified, anything you don't change will be marked unmodified. You can stage a file whenever you want to record a snapshot of your current progress (and you can undo snapshots, too, but that's not needed here). Staged files are added to your next commit.

A commit is a collection of files that are being prepared for upload - until the commit is pushed, it's all local to your machine.

Let's pretend you go ahead and make some changes to the code you downloaded. Maybe you added something to the README.md. Remember, you're still in the ~/projects/mpiT/ directory:

echo "This is super important stuff!" >> README.md

If we check which tracked files have been changed, git will tell us that README.md has been modified, and has not yet been 'staged' for the next commit to the server.

git status

if you're ready to stage the file, run

git add README.md

you can also use the git add command on a directory, and it will recursively stage everything in the directory for the next commit.

If you want to add a new file to the repository

echo "hahahaha" > laughing.txt

you have to tell git to track it -

git add laughing.txt

and it will be automatically staged (what else was git supposed to do with it?).

Also note that if you stage a file, and then edit it again, the staged version will remain whatever version you had when you ran git add. If you want to stage the new version, you have to run git add again.

Use git status to monitor the version of each file being staged. Use

git status -s

to get a less 'verbose' version of the status output. When everything has been edited and modified and staged properly, use

git commit -m "write a commit message here"

to upload your changes. Keep the quotation marks when you write your message, which is intended to tell others what changes you made. If you don't include the -m "commit message" bit, then you'll be prompted for a longform message in a command line text editor.

While in the directory, git will automatically name the server you cloned from "origin". That way, when you need to do something related to the online github version of your code, you can reference it with "origin".

Here, you can check the name of the remote repository you're working with. A "remote repository" is what you call the online (i.e., not-on-your-laptop) version of the code. You can add more, but that's not needed here.

git remote

You should see the name used to refer to the repository -

origin

just as discussed above. And if you want to see exactly what server 'origin' refers to, you can run

git remote -v

to see a list - in my case, probably something like this:

origin https://github.com/umhau/mpiT.git (fetch)
origin https://github.com/umhau/mpiT (push)

Back on track: let's upload the commit you assembled (the commit: your collection of staged files). It's called pushing, and you do it like this:

git push origin master

Now the changes you made on your computer have been uploaded to github. That's it!

Friday, January 27, 2017

Command line system resource monitor

Shows cpu usage, memory, swap.

sudo apt-get install htop
htop

Show CPU info via command line

This gives a ton of information - way more that I generally ever need.

less /proc/cpuinfo

This is the tidy version.

lscpu

This is the min and max clock speed of the CPU:

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq

This is a cool command to keep track of the current CPU clock speed.

sudo watch -n 1  cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq

Thursday, January 26, 2017

Ubuntu Server 16.04 not detecting wifi card

This turns out to be a relatively simple issue.

This command is my go-to for internet connection diagnostics, but it wasn't showing my wifi card.

ifconfig

This shows all interfaces, not just the activated ones.

ifconfig -a

to activate a hardware device after determining its name, run:

ifconfig "device name" up

Bonus: this gives you way more than you'll ever need to know about the hardware capabilities of the device.

iw list

(Internet via Ethernet) + (SSH via wireless)

= (sustained SSH during MPICH computing)

I've been losing SSH connection after starting process jobs on my new beowulf cluster. This is my current fix, since my theory is that the network switch is so clogged with MPI-related communication (which does take place via ssh) that there's no bandwidth left for my administrative SSH connection. Theory supported by observation that when I plug my unrelated control machine into the switch it can't ping google.

assumptions

Ubuntu 16.04.1 LTS
working wireless and ethernet: I had to do this and this.

sources

https://web.archive.org/web/20140307210607/http://www.themagpi.com/issue/issue-11/article/turn-your-raspberry-pi-into-a-wireless-access-point
http://askubuntu.com/a/180734
https://seravo.fi/2014/create-wireless-access-point-hostapd

check wifi hardware capability

Run the command

iw list

And look for a section like the following. If it includes 'AP' (see emboldened bit), you're golden. If not, look for a different wireless card.

Supported interface modes:  
         * IBSS 
         * managed  
         * AP 
         * AP/VLAN  
         * monitor

install dependencies

sudo apt-get install rfkill hostapd hostap-utils iw dnsmasq

identify interface names

As of ubuntu 16.04, the standard wlan0 and eth0 interface names are no longer in use. You'll have to identify them specifically. Use the following command, which lists the contents of the folder for each interface device, and look for the device that has a folder named 'wireless'. src.

ls /sys/class/net/*

Observe the assumptions above to see what I'm calling them.

configure wifi settings

There's three files you'll have to configure. Since I'm logged in via ssh, I don't want to interrupt my connection until I've created a new access point I can connect to. So I'll walk through editing each file in turn, then I'll have one command at the end that activates all the changes.

configure wireless interface: /etc/network/interfaces

Backup your current interface file.

sudo cp /etc/network/interfaces /etc/network/interfaces.bak

and then edit the original

sudo nano /etc/network/interfaces

replace the contents of the file - change the interface names as appropriate.

auto lo
iface lo inet loopback

auto enp2s0
iface enp2s0 inet dhcp

auto wlp1s0
iface wlp1s0 inet static
hostapd /etc/hostapd/hostapd.conf
address 192.168.3.14
netmask 255.255.255.0

Normally I'd say that here's where you restart the interface, but we're saving that for the end.

configure the access point: /etc/hostapd/hostapd.conf

backup the original file - it's ok if there's nothing there.

sudo cp /etc/hostapd/hostapd.conf /etc/hostapd/hostapd.conf.bak

edit the original

sudo nano /etc/hostapd/hostapd.conf

put this in:

interface=wlp1s0
driver=nl80211
ssid=test
hw_mode=g
channel=1
macaddr_acl=0
auth_algs=1
ignore_broadcast_ssid=0
wpa=3
wpa_passphrase=1234567890
wpa_key_mgmt=WPA-PSK
wpa_pairwise=TKIP
rsn_pairwise=CCMP

Inexplicably, this only seems to produce a detectable wifi access point when the ssid is 'test'. I tried several other non-keyword names, and none of them worked. Go back to 'test', and it worked. Did it several times...magic.

Save and exit.

configure the DHCP server

this is where the access point actually becomes something you can access. backup:

sudo cp /etc/dnsmasq.conf /etc/dnsmasq.conf.bak

edit original - since the file is so big, I rm'd the original and pasted the contents below into an empty file.

sudo rm /etc/dnsmasq.conf
sudo nano /etc/dnsmasq.conf

make it look like this:

# Never forward plain names (without a #dot or domain part)
domain-needed

# Only listen for DHCP on wlan0
interface=wlp1s0

# create a domain if you want, comment #it out otherwise
# domain=Pi-Point.co.uk

# Create a dhcp range on your /24 wlp1s0 #network with 12 hour lease time
dhcp-range=192.168.3.15,192.168.3.254, 255.255.255.0,12h

Save and exit.

implement changes

this is going to be one big command. if it works, you're in business...if it doesn't, you'll have to login directly to the machine for troubleshooting.

sudo ifdown wlp1s0; sudo ifup wlp1s0; sudo service hostapd restart; sudo service dnsmasq restart

Worked for me: I now have a secondary wireless access to my beowulf cluster for when the ethernet gets clogged with MPI signals.

Wednesday, January 25, 2017

Basic Vim

The cheat sheets and guides out there don't seem to provide a practical intro to Vim. I'm not able to use MS Code on one of my primary interfaces, so I'm looking for the next best thing. Vim, so I've heard, is probably it. This is a great little tutorial to introduce the basics.

There's two modes: command mode, and insert mode. Command mode is where you do things that would normally be accessed via cursor, arrow keys or a menu, and insert mode is where you type letters and they appear on the screen and you can use the arrow keys like you're used to. When you open vim, you start in command mode.

This should get you to about a nano level of proficiency.

Basic Usage

open foo  | vim foo

save file | :w

quit file | :q

Command mode | [ESC]

                                  |   k
Move cursor left, right, up, down | h   l
                                  |   j

Insert here | i

Insert new line below | o

Delete char under cursor | x

Here's a nice cheat sheet for further use.

Thursday, January 19, 2017

When Ubuntu Server doesn't let you keep the internet device drivers

So I ran into a problem while setting up Ubuntu Server 16.04.1 LTS for my cluster: one of my computers had restricted hardware drivers for the wifi and ethernet devices, and while the drivers were available on the install media, they weren't transferred over. Something about non-free stuff. So after installation, I was stuck with a computer that had no access to the web.

After a ton of searching, I a) found what the hardware problem was, and b) realized that I can use the install media as the installation source - that might sound obvious, but the keyword there is "can", not "should be able to". It's badly documented and very not-obvious.

Anyway, here's the source for using the install media. Part of the install process is choosing which sets of packages you want to include - things like OpenSSH Server stuff or Mail Server stuff. There's also one called Manual Package Selection. It didn't seem to do anything during the install (though I did walk away, and it might have timed out), but you can reenter the tool after the installation is finished and you've rebooted into the new OS.

After rebooting, log in and plug the ubuntu server install media into the computer. Run:

sudo tasksel install manual

This command will find the install media, and 'install' the manual package selection 'package'. In other cases, it would actually install stuff - in this case, it just gives you a shell prompt. At this prompt, you somehow have full internet access - I don't know how it happened. It was automagical. In my case, the next step was editing the sources list to enable the universe repository (already seemed enabled - automagic of the tasksel command?). Source on the actual internet driver fix.

sudo nano /etc/apt/sources.list

And a line that looks like this

deb http://us.archive.ubuntu.com/ubuntu/ xenial main restricted

Should be changed to look like this:

deb http://us.archive.ubuntu.com/ubuntu/ xenial main restricted universe

[ctrl]-x to leave. Then, run

sudo apt-get install r8168-dkms

And the driver is installed. Reboot and the internet should work.

Install a package without internet

So first of all, this isn't original. Credit goes here. But it's fantastic, and I wish I'd known about this a long time ago. As usual, for my own memory/use: and actually, I'm just going to clean up what the other guy said. He did a great job.

On the Internet-less computer:

In the terminal enter:

PACKAGENAME=<The name of the Package to install>

and then

apt-get -qqs install $PACKAGENAME | grep Inst | awk '{print $2}' | xargs apt-cache show | grep 'Filename: ' | awk '{print $2}' | while read filepath; do echo "wget \"http://archive.ubuntu.com/ubuntu/${filepath}\""; done >downloader.sh

A ready-to-use downloader for the package has now been created in the home folder. Open your home directory in the file browser and move the file downloader.sh to the top-level directory of your flash drive. Then eject your flash drive.

On the computer with Internet:

Insert your flash drive, and open your flash drive in the file browser. Copy the location of your flash drive:

[CTRL]-L

[CTRL] C

Move into the directory of the flash drive. In a terminal this time, type:

cd [CTRL]+[SHIFT]+V

Run the downloader:

bash ./downloader.sh

Wait for the download to complete and eject your flash drive.

Back to the Internet-less computer:

Open your flash drive in the file browser. In the browser, type the following to copy the file location of the flash drive.

[CTRL]-L

[CTRL]-C

Move into the directory of the flash drive. In a terminal this time, type:

cd [CTRL]+[SHIFT]+V 
sudo dpkg --install *.deb

That's it!

Tuesday, November 22, 2016

Running GUI programs over ssh command line

I've never run into this before, and it's really cool.

ssh -X pi@raspberrypi

This allows the remote computer to use the local computer's x server to run the gui for the remote program. Sooo, I can use Mathematica on my Mint 18 laptop through an ssh connection to my RPi 2. Very cool.

Monday, November 14, 2016

Setting up Raspberry Pi Zero in Headless Mode

Download the raspbian image, dd it to the sd card, unmount and remount it on the computer.

Figure out the name of your micro sd card (I'm going to assume that it's sdb) - look for the device with a total size similar to what yours is labeled as.

lsblk

Download the image here. Normally I'd give you a wget command, but it's a lot faster doing it on the browser. Grab the "Raspbian Jessie with Pixel" version...I don't trust the Lite to have everything needed for future projects.

Extract the file (right-click and select either 'extract' or 'open with archive manager') and put it in the Downloads folder. If you rename it to raspbian.img after extracting, then you can cut and paste the following command. Note that you should change 'foo' to the device name you identified above, e.g., of=/dev/sdb. I'm keeping it as of=/dev/foo to prevent accidents.

sudo dd bs=4M if=/home/$USER/Downloads/raspbian.img of=/dev/foo status=progress

Remove the sd card from the computer and plug it back in. That seems to allow the low-level processes to reassess the contents of the card after everything on it was changed.

Resize the partition in gparted if you're planning on doing anything that involves a significant amount of storage. The Raspbian image doesn't leave a lot of free space, and your card probably has extra room on it. Install gparted below, then run and resize (that's a GUI operation, so no walkthrough - sorry. Also, I've been doing from nearly the beginning of my Linux career and don't need any reminders on how to do it).

sudo apt-get install gparted

Set up the networking - you'll need to tell the pi what the wifi password is (Source). cd into the primary partition of the sd card - you'll find it as the folder named with a long string of meaningless text if you do:

ls /media/$USER

cd into that folder (represented by me here with x's and dashes) and then go several folders deeper to edit the wifi config file. The whole thing is done below.

sudo nano /media/$USER/xxxx-xxxx-xxxxx/etc/wpa_supplicant/wpa_supplicant.conf

Add this to the end of the file, where "foo" is the name of your wifi network (literally, what shows up when you're choosing a network to connect to - it is case-sensitive) and "bar" is the wifi password. Both should be in quotation marks.

network={
  ssid="foo"
  psk="bar"
  proto=RSN
  key_mgmt=WPA-PSK
  pairwise=CCMP
  auth_alg=OPEN
}

Save and exit by pressing CTRL-O, ENTER and CTRL-X.

Unmount, stick the ssd in the pi, connect a wifi adapter and plug in the power. Most of the time you don't even need to know the IP of the pi, as you can use the computer name to ssh in:

ssh pi@raspberrypi

Note the password is

raspberry

A few extra commands to be aware of:

Note that to ssh into a Mint 18 distro from the pi, you need to install openssh-server on Mint first.

sudo apt-get install openssh-server

If you know the user and computer names that you want to ssh into, you can use them instead of hard-to-keep-track-of IP addresses. If everything is on the local network (LAN).

Copy files with scp:

scp user@computer:desired.file ~/path/to/containing/folder

And on an *ahem* completely unrelated note, found a few more dependencies that sphinxtrain needed in the install script. I'll update vmc when I get a chance.

Tuesday, October 11, 2016

Building a Statistical Language Model

Update: I finished my script for creating custom language models. See here: https://github.com/umhau/vmc.

There's a summary at the end with what I figured out. Most of this is me thinking on paper.

The statistical language model is used for helping CMU Sphinx know what words exist, and what the order the words exist in (the grammar and syntax structure). The intro website to all this is here.

I'm trying to decide between the SRILM and the MITLM packages [subsequent edit: also the logios package and the quicklm pearl script - these are referenced in hard-to-find places on the CMU website; see here and here, respectively] [another subsequent edit: looks like I found a link to the official CMU Statistical Language Model toolkit - it was buried in the QuickLM script]. S- is easier to use, apparently, and the CMU site provides example commands. M-, however, seems more likely to stick around and be accessible on github for the long-term. Plus, I forked it.

[sorry, blogger's formatting broke and I had to convert everything to plaintext and start over...lost the links.]

Only downside is, the main contributor to MITLM stopped work on it about 6 mos ago, and started dealing with Kaldi instead. Guess he figured the newer tech was more worth his time. Still, dinosaurs have their place; just watch Space Cowboys to get the picture.

MITLM

Just to be sure that the software doesn't go anywhere, code is downloaded from my repository.

Update: Thanks to Qi Wang's comment below there's an extra dependency to install:

sudo apt-get install autoconf-archive

Installation of MITLM:

cd ~/tools
git clone https://github.com/umhau/mitlm.git
cd ./mitlm
./autogen.sh
./configure
make
make install

~~So, turns out that there's some weird problems with the installation. Something changed, or something isn't being installed properly. The compilation seems to fail with these errors:~~

./configure: line 19641: AX_CXX_HEADER_TR1_UNORDERED_MAP: command not found
./configure: line 19642: syntax error near unexpected token `noext,'
./configure: line 19642: `AX_CXX_COMPILE_STDCXX_11(noext, optional)'

~~g++ wasn't installed, but even after that was added it still wouldn't work.~~

Update: Unfortunately, I've lost track of other dependencies involved - at some point, I'll make a list of all the stuff I've installed while working on this project. Had to install libtool (or similar?) to get here. Mental note:

libtoolize:   error: Failed to create 'build-aux'

But, that's because I'm trying to do this on a different Mint installation from my usual - on my default workstation, that dependency is installed (no idea what it is, except that it's probably listed somewhere on this blog).

After installing the extra dependency, the installation works! So this is a viable avenue thus far to get the LM working. I've already made it past where I need the MITLM, though, so I'm going to let it be for now. Might have to come back for it.

SRILM

Ok, let's see what SRILM has to offer us. It's more inconvenient to install; ya have to go through a license agreement to download it, so I can't just stick a bash command here.

...unless I put the code on my github. In which case, it's easy to get a copy of. Too bad there's too many files to put up an extracted version, and too bad the compressed version is more than 25mb. Time to split up the tar.gz file again; for my own records, here's how I split it. All I need for getting and using it is the reconstruction bit.

The splitting part, given the archive file:

split -b 24m -d srilm-1.7.1.tar.gz srilm-1.7.1.tar.gz.part-

Alright. Once the file is on github, it's just more copy-pasting.

cd ~/tools
git clone https://github.com/umhau/srilm.git
cd ./srilm
cat srilm-1.7.1.tar.gz.part-* | tar -xz

By the way, WOW. The installation process for this software is not straightforward. See the install file for the instructions on installation - read for background, then copy-paste below as usual.

gedit ./INSTALL

Step 2 - swap out the SRILM variable for one delimiting the root directory of the package. Source.

sed -i '7s#.*#SRILM = ~/tools/srilm#' ./Makefile

For now, assuming that the variables are all good. I don't know if I want maximum entropy models, though it sounds useful...I'll see what happens if I don't prep them.

Installing John Ousterhout's TCL toolkit - we're past the required v7.3, and up to 8.6: hope this still works. I'm compiling from source rather than using the available binaries 'cause they come with some kind of non-commercial/education license, which I don't like being tied down by.

cd ~/tools
git clone https://github.com/umhau/tcl-tk.git
cd ./tcl-tk
gunzip < tcl8.6.6-src.tar.gz | tar xvf -
gunzip < tk8.6.6-src.tar.gz | tar xvf -

Install TCL:

cd tcl8.6.6/unix
# chmod +x configure
configure --enable-threads
make -j 3
make test 
sudo make -j 3 install

Let's try running the rest without the TK stuff...even though John says it's needed. Heh. Leeeroooy Jenkins!

cd ../../../srilm
make World

...aaaaaaaand, Fail.

This is going nowhere fast. We're in dependency hell. Let's try the perl script CMU uses (it's the backend to the online service they officially reference).

The Perl Script

Thankfully, Mint comes with perl installed. So, the question is how to use the script.

cd ~/tools
mkdir ./CMU_LMtool && cd ./CMU_LMtool
wget http://www.speech.cs.cmu.edu/tools/download/quick_lm.pl

The only thing left here is to figure out how to use the script...having never used perl, this could be interesting. Dug this nugget out of the script:

usage: quick_lm -s <sentence_file> [-w <word_file>] [-d discount]

So, the idea with the LMtool is to process sentences that the decoder should recognize - it doesn't need to be an exhaustive list, however, because the decoder will allow fragments to recombine in the detection phase. As a corpus example (from the CMU website), here's the following:

THIS IS AN EXAMPLE SENTENCE
EACH LINE IS SOMETHING THAT YOU'D WANT YOUR SYSTEM TO RECOGNIZE
ACRONYMS PRONOUNCED AS LETTERS ARE BEST ENTERED AS A T_L_A
NUMBERS AND ABBREVIATIONS OUGHT TO BE SPELLED OUT FOR EXAMPLE
TWO HUNDRED SIXTY THREE ET CETERA
YOU CAN UPLOAD A FEW THOUSAND SENTENCES
BUT THERE IS A LIMIT

We'll use this sentence collection to test the perl script:

cd ~/tools/CMU_LMtool
wget https://raw.githubusercontent.com/umhau/misc-LMtools/master/ex-corpus.txt
perl quick_lm.pl -s ex-corpus.txt

Well, it did exactly nothing. No terminal output, no new files created in the directory, and no errors. Time to search the script for other possible output locations. How weird can it be?

...

Ok, solved the problem. Thank goodness for auto highlighting in Gedit. The authors used some kind of weird system for comments that I'm guessing was retired since this script was written. It seems to have been throwing the compiler for a loop:

=POD
/*
[some text wrapped by those comment markers]
*/
[more text, only wrapped by the '=' things]
=END

So, I re-commented all the introductory stuff, and put the fixed version in the github repo.

Summary of the Perl script

So, here's how it works: download the fixed script, give it a sentence list, and run the command. Simple. And, looking at the output, the function it performs is pretty simple too. Makes a list of all the 1, 2 and 3 - word groupings in the list.

Here's what to do:

mkdir ~/tools/CMU_LMtool && cd ~/tools/CMU_LMtool
wget https://raw.githubusercontent.com/umhau/misc-LMtools/master/ex-corpus.txt
wget https://raw.githubusercontent.com/umhau/misc-LMtools/master/quick_lm.pl
perl quick_lm.pl -s ex-corpus.txt

Still not sure what that does for me, but I have my LM!

Notes: I think the word list option in the command refers to the possibility of a limited vocabulary...not sure how that relates to words outside that list used in the sentence list. The discount in the command, however, is fixed at 0.5. Apparently Greg and Ben did some experiments to discover that's definitely the optimal setting.

Second Note: based on readings from the CMU website, this LM isn't good for much more than command-and-control - it can successfully detect short phrases accurately, but not long, drawn-out sentences. So it'll be good for most of what I want, but anything complex will need to be done with the CMULMTK package.

Hold on - the [-w <word_file>] option for a dictionary might be a request for output - not an extra input. And given that I do need an explicit dictionary for transcription, that's probably what it does. That would be wonderful. I can even use that sentence list for voice training - which would be a fabulous way to ensure accuracy.

Unfortunately, that's not the case. Oh, well.

The official CMU Statistical Language Model toolkit

Ok, maybe this'll do it for me. Here's the link to the source. The Perl script doesn't make all the different files I need - especially the pronunciation dictionary.

mkdir ./tools/CMUSLM
cd ./tools/CMUSLM
wget http://www.speech.cs.cmu.edu/SLM/CMU-Cam_Toolkit_v2.tar.gz
gunzip < CMU-Cam_Toolkit_v2.tar.gz | tar xv
cd ./CMU-Cam_Toolkit_v2

Wow, this is old. You have to uncomment something if your computer isn't running HP-UX, IRIX, SunOS, or Solaris. I'm pretty sure anything build in this decade needs uncomment, but if you're unsure the README mentions a script you can run to check for yourself:

bash endian.sh

Ok, uncomment:

sed -i '37s/#//' ./src/Makefile
cd src
make install

Hard to tell if this was successful. I get the impression watching this compile that it was written in the 80s, and updated for compatibility with something advertising a max capacity of 512 Mb of random access memory.

Time to dive into the html documentation, and figure out usage. The goal is to create the LM and DIC files - and a nice perk would be the other stuff produced by the online LM generator.

Turns out, there doesn't seem to be any kind of pronunciation dictionary produced by this tool. So it's no good.

The Logios Package

This seems to be the tool CMU claims was actually used in their website - and, indeed, some of their tools within the package are designed for use in a webform. So I might be on the right track. The only problem is, the input is not a list of sentences: it's a grammar file built by the Phoenix tool. No idea what that is or how it works.

CMU, get your act together! The website is nice, but I've got no recourse if it goes down. I want an independent system!

Here goes. Goal: LM and DIC files. Starting point: list of sentences.

Download the package. Even this isn't user-friendly - the folder structure is in html. I used wget recursively to download the webpages. See here for source on the command.

CMUDict

Actually, it seems like I could just use the dictionary directly. The whole problem is one of how to get the entries from this file into a subset file that holds just what I want - so I'll just write a small script to do just that. What a pain.

wget http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/sphinxdict/cmudict_SPHINX_40

I'll post the script soon - it's being added to a larger package that should make the process of getting a personal language model pretty painless. That'd be nice.

Saturday, October 8, 2016

Compressing and splitting folder archives

This isn't exactly about tar.gz files in particular - more like what to do when they're too big to upload into github...i.e., this is how to split them into pieces.

Compress the folder into a gzip archive (all you need when compressing a single folder):

gzip -c my_large_file | split -b 1024MiB - myfile_split.gz_

If the folder has already been compressed, here's the command for splitting it:

split -b 1024m "file.tar.gz" "file.tar.gz.part-"

Apparently, recombining is a good use for the cat command. Pipe that output to gunzip and you won't have to create an intermediary archive file before decompression.

cat myfile_split.gz_* | gunzip -c > my_large_file

Basically, by using the pipes you avoid ever letting the unsplit archive sit on your drive.

I found the answer here.