Skip to main content

A Homemade Beowulf Cluster: Part 1, Hardware Assembly

A beowulf cluster lets me tie miscellaneous computers together and use their cpus like one large processor...I think.  Never done this before, still working on the details.

I'm building this with random laptops: generally i5s, and I think one's a Core Duo - it's half decent thing from 2011.  Might even throw in an RPi2 for good measure.

Make sure you read through this before starting.  You want to know what you're getting into.  Watch out, though - this is a long post.

Notes

Since we're working on multiple computers here, not everything is going to be cut-and-paste.  I will make sure that it's as clear as possible, however.  There won't be any hand-waving or assumptions of prior knowledge.

I'm doing this with Linux Mint 18 on my primary laptop.

Primary Sources

Setting up the cluster: src 1,  src 2, src 3

Hardware Ingredients

  • Since the benefit of this tool is sharing computations between computers, you need a way to route that information.  Hence, an ethernet switch.  Go for a gigabit, since you don't want the switch to be your bottleneck.  I was cheap and got myself the 5-port version, and I'm already kicking myself.  Go for the 8-port version at least.  Here's the 5-port version I got, for consistency's sake. 
  • A ton of ethernet cables.  You can do with short ones, but you'll want to connect the switch to your router so you can access the computers from outside their own tiny network.
  • Leftover computers.  You won't be using these for anything else, so make sure you don't need them. 

Computer Preparation

Install Ubuntu Server to each computer. I used 16.04.1 LTS.  For those with a penchant for funny names (or who need to deal with annoying, obtuse colloquialisms on the web), that's the Xenial Xerus edition.

You'll need to keep track of computer names ("hostnames"), and install consistent users (so the username and password are the same on each machine).  I like to increment the computer names with letters.  For the purposes of this guide, we'll use:
hostname:    grendel-a
username:    beowulf
password:    hrunting

Create installation USB flash drive

Go here to download the 16.04.1 LTS server ISO.  That link initiates the download; here's a bit of context.

Burn the ISO to a USB flash drive. Note that you'll lose everything on the USB you use.  Run the following command twice, once before plugging in your USB and then a few seconds after.  The new entry when you run it the second time is your flash drive.
lsblk
For example, this is the output when I run that command with a flash drive plugged in.  Note on the right where it specifies the mount point (at /media/me/storage), and on the left where it shows me the name is sdb.
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda           8:0    0 167.7G  0 disk 
├─sda1        8:1    0   487M  0 part /boot
└─sda5        8:5    0 167.2G  0 part 
  ├─mint--vg-root
  │         252:0    0 159.3G  0 lvm  /
  └─mint--vg-swap_1
            252:1    0   7.9G  0 lvm  [SWAP]
sdb           8:16   1   1.9G  0 disk 
└─sdb1        8:17   1   1.9M  0 part /media/me/storage
If your USB is mounted, it has to be unmounted first - else weird things can happen in the next step.  Trust me: once I didn't unmount a partition before copying it with dd, and my MBR was wiped out instead.  In the example above, unmounting would work like this:
umount "/media/me/storage"
If we pretend your USB is the sdb device, this is the command you'd run (I'm assuming the ISO was saved to the default "~/Downloads" location).  Swap out the 'sdb' part with what the lsblk command indicated.  Also be aware that if you mess this up, you will probably destroy whatever computer you're running the command with.  Just FYI.
sudo dd if=~/Downloads/ubuntu-16.04.1-server-amd64.iso of=/dev/sdb bs=4M status=progress

Install to each computer

You're gonna have to follow this procedure with each computer in the cluster.  There isn't a simple way around it, that I'm aware of.

Plug in the flash drive, and reboot/turn on the computer.  Make sure it's connected to the web.  Press ESC, F1, F2, F11 or F12 to choose a startup device...if those don't work, Google:
"how to choose startup device BIOS" & [your computer type]
e.g., "ThinkPad T420".

When it boots up, there's a few settings to make sure of.  Most of them should be straightforward: choosing a keyboard layout and a default language, for instance.  Just in case of problems, here's a full walkthrough.
All you want is a simple installation.
Don't bother with detecting the layout...if you're in the US or have an english keyboard, you can just stick with the defaults.  Worst case, start over.  There's a few more images after that one to do keyboard stuff, but you get the idea.
grendel-a, grendel-b, etc.  Makes it easy to keep the computers straight. After you've entered the hostname as above, press [enter].
I'm using the same thing for the full name and the username.  Makes it simple. Press [enter].
Since the username is the same as the full name, just press [enter].
That's the standard password for the computers.  I used the [down arrow] and pressed [space] to select that radio button and show the password.  Just press [enter].  It'll ask you to reenter the password for verification; do so, and press [enter] again.
Don't bother with encryption; this is a cluster designed for speed, and that just makes disk access slower.  For the next screen, check your time zone.  If it's wrong, the adjustment screen is pretty intuitive.
Press [enter] here to go with the default option.  You want to use the entire disk, and LVM could be useful (it's also an incredible pain).  Press [enter] again on the next screen after verifying the destination for the install.  It's probably the largest drive - the small one is probably your flash drive.

It asks for confirmation; [right arrow] and [enter].
I like to use a standard size for the installation that leaves lots of room to spare for other things on the drives.  Might be able to do some kind of shared network storage for the computations with the rest.  If it lets you, go for 50.0 GB.   [enter].

Another confirmation.  [right arrow] + [enter].

Unless you have weird internet, leave this blank and press [enter].  If you do have a weird internet setting, then I can't help you.
I skipped the auto updates, since they won't positively impact the performance of the cluster, and might slow it down at times.
Ok, this is the one that really matters.  Addendum: also select 'Manual Package Selection'.  For some machines with obscure drivers, this seems to ensure the drivers get installed.  You need the OpenSSH server in order to communicate with your cluster via the terminal. (at least, I'm pretty sure you do.  I didn't want to spend the time to find out for sure.)  [arrow down] to it, then [space] to select.  [enter] to move on.
Yes; you want GRUB installed to the MBR.  [enter].
...and that's it.  Reboot, remove the flash drive, and the new OS is installed.

Final Adjustment: power management

There is one more thing, though: you'll be running a bunch of these computers in the cluster, and you don't want to have to deal with them individually.  For one thing, that would be a huge mess.  So to keep them consolidated, you'll want to keep their lids shut while they're running: and if you don't change a setting, they'll just go to sleep when you close the lid.  src
sudo nano /etc/systemd/logind.conf
Find the line:
#HandleLidSwitch=suspend
and change it to:
HandleLidSwitch=ignore
Then restart that bit of the system:
sudo service systemd-logind restart

Assembling the Cluster Hardware

This is pretty straightforward: start with your ethernet switch.
  1. Run a cable from your router to the #1 port on the switch.
  2. Run a cable from each computer in the cluster to a port on the switch.  I'm not going to try and correlate port numbers and computer numbers.  Pretty sure it doesn't matter.
And that's all it takes to assemble the cluster hardware!  You should be able to ping your computers and get a response:
ping grendel-a
If you don't get replies, you have a problem.  Google is your friend.  Worked for me.

Comments

mrpurple said…
Nice,
I built one a couple of years ago for embarrasingly parallel GIS. Two things you might find of interest:
1) Sometimes adding a slow node can slow your cluster down rather than speed it up as everybody waits for the slow node to finish its jobs. Depending of course on the relative size of the jobs and the relative speed of the nodes.
2) A lot of your time may be saved by building a roll-your-own ubuntu installer disk/usb
A description of my build is here: http://www.purplelinux.co.nz/?p=160

Popular posts from this blog

Installing Kaldi and Kaldi-Gstreamer-server on Ubuntu 16.04

Notes on the process of installing Kaldi and Kaldi-GStreamer-server on Ubuntu 16.04 LTS.  These were modified somewhat, since this is retroactively documented for my own benefit.
Kaldi is a state-of-the-art speech transcription engine, geared towards researchers and people who already know what they're doing.  I'm just trying to set it up.

Decide where to put Kaldi and make that your new working directory.
mkdir ~/tools/ cd tools Clone Kaldi from github.
git clone https://github.com/kaldi-asr/kaldi.gitcd into this new location.
cd ./kaldi-master/tools Check for any dependencies.  There were a few things I needed to add to my Ubuntu installation; don't remember what they were.  Do whatever this output instructs.
extras/check_dependencies.sh Now comes the actual installation.
make cd ../src ./configure --shared make depend make Run this next to install the online extensions.
make ext Note: if you have more than one core in your machine, you can run make -j 4 to do make in parall…

A Homemade Beowulf Cluster: Part 2, Machine Configuration

This section starts with a set of machines all tied together with an ethernet switch and running Ubuntu Server 16.04.1.  If the switch is plugged into the local network router, then the machines can be ssh'd into.

This should be picking up right where Part 1 left off. src.  So.
Enabling Scripted, Sudo Remote Access The first step in the configuration process is to modify the root-owned host files on each machine.  I'm not doing that by hand, and I've already spent way too long trying to find a way to edit root-owned files through ssh automatically.

It's not possible without "security risks".  Since this is a local cluster, and my threat model doesn't include -- or care about -- people hacking in to the machines or me messing things up, I'm going the old fashioned way.  I also don't care about wiping my cluster accidentally, since I'm documenting the exact process I used to achieve it (and I'm making backups of any data I create).

Log into…