Showing posts with label sinfo 0.0.45. Show all posts
Showing posts with label sinfo 0.0.45. Show all posts

13 March 2012

106. htop 1.0.1 and sinfo-0.0.45 on rock 5.4.3/centos 5.6

There are a number of performance monitor tools in the debian repos. ROCKS 5.4.3/Centos doesn't seem quite as well-equipped.

First out, htop:

htop:
wget http://downloads.sourceforge.net/project/htop/htop/1.0.1/htop-1.0.1.tar.gz
tar -xvf htop-1.0.1.tar.gz
cd htop-1.0.1/
./configure --prefix=/home/me/.htop
make
make install

It's as simple as that.
Add e.g.
alias htop='/home/me/.htop/bin/htop'
to your ~/.bashrc
Note: this works on Scientific Linux (boron) 5.4 as well.

sinfo:
Update 13/03/2012:
Sinfo <0.0.44 has IPv6 enabled by default.
On sinfo >=0.0.45 you can disable IPv6 using ./configure --disable-IPv6

Sinfo is probably the snazziest cluster monitoring tool that I know of. Sure, ganglia etc. are nice too, but they run as web service. Sinfo is a 'simple' curses program, but building it on CentOS was a bit of a challenge.

Be aware that sinfo versions prior to 0.045 expect ipv6 to work -- by default ROCKS disables IPv6, so use sinfo 0.0.45 and above.





First boost:
(yum install boost-devel didn't do anything for me)
cd ~/tmp
wget http://sourceforge.net/projects/boost/files/boost/1.49.0/boost_1_49_0.tar.gz/download
tar -xvf boost_1_49_0.tar.gz
cd boost_1_49_0/
./bootstrap.sh --prefix=/usr

Edit Jamroot and add
using mpi ;
The space between mpi and ; is needed.

Symlink to your mpic++, e.g. if your mpic++ is in /opt/openmpi:
sudo ln -s /opt/openmpi/bin/mpic++ /usr/bin/mpic++

The following step takes a long time:
sudo ./b2 -a install --layout=versioned --build-type=complete

These days all the libboost libs are multithread aware (or so I hear), and in debian it turns out that the -mt.so libs are just symbolic links to the 'regular' libs.
sudo ln -s /usr/lib/libboost_signals.so /usr/lib/libboost_signals-mt.so
sudo ln -s /usr/lib/libboost_date_time.so /usr/lib/libboost_date_time-mt.so
sudo ln -s /usr/lib/libboost_serialization.so /usr/lib/libboost_serialization-mt.so
sudo ln -s /usr/lib/libboost_wserialization.so /usr/lib/libboost_wserialization-mt.so
sudo ln -s /usr/lib/libboost_regex.so /usr/lib/libboost_regex-mt.so

sudo ln -s /usr/lib/libboost_signals.so.1.49.0 /usr/lib64/libboost_signals.so.1.49.0

Then asio
cd ~/tmp
wget "http://downloads.sourceforge.net/project/asio/asio/1.5.3%20%28Development%29/asio-1.5.3.tar.bz2?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fasio%2F&ts=1331441086&use_mirror=aarnet"
tar -xvf asio-1.5.3.tar.bz2
cd asio-1.5.3/
./configure
make
sudo make install

Then sinfo
cd ~/tmp
wget http://www.ant.uni-bremen.de/whomes/rinas/sinfo/download/sinfo-0.0.45.tar.gz
tar -xvf sinfo-0.0.45.tar.gz
cd sinfo-0.0.45/
./configure --disable-IPv6

The build should be fine.

Configuration:
you'll end up with
/usr/local/sbin/sinfod
/usr/local/bin/sinfo
You may want to make sure there are paths to them by adding the following to your ~/.bashrc:
export PATH=$PATH:/usr/local/bin:/usr/local/sbin
The changes take effect next time you log in to a shell, or just run
source ~/.bashrc
for immediate effect.

Also, create a file called /etc/default/sinfo with the following in it:
OPTS="--quiet --bcastaddress=192.168.1.255"

Start sinfod with
sinfod --quiet --bcastaddress=192.168.1.255

then check that it's running
ps aux | grep sinfod

If it's not running, then try
sinfod -F

If it gives something along the lines of
exception:open:address family not supported
you most likely
1) haven't enabled ipv6 for your interface and
2) didn't disable IPv6 during compilation and/or
3) used version<0.045

Check by doing ifconfig -- does it return both an ipv4 and an ipv6 address?

Enabling ipv6
Unless you know what you're doing, don't fiddle with the network interfaces on a production cluster -- network interfaces on a multinode cluster are typically highly tuned to minimise latency, so don't mess it up.

Anyway. First check your /etc/modules.conf and - if present - comment out
alias ipv6 off
options ipv6 disable=1
Edit your /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1:0
IPADDR=192.168.1.111
NETMASK=255.255.255.0
BOOTPROTO=none
MTU=1500
TYPE=Ethernet
GATEWAY=192.168.1.1
USERCTL=no
IPV6INIT=yes
PEERDNS=yes
ONPARENT=yes
IPV6ADDR=fe80::2f0:4dff:f383:b44/64
IPV6_DEFAULTGW=fe80::2f0:4dff:fe83:a48/64
I just made up the IPV6ADDR, and took the IPV6_DEFAULTGW from my gateway machine (running debian, so ipv6 enabled by default)

Assuming that your firewall is allowing traffic at port 60003 and free traffic in and out on 192.168.1.255 things should work fine.



Errors


Error (boost):
MPI auto-detection failed: unknown wrapper compiler mpic++
Please report this error to the Boost mailing list: http://www.boost.org
You will need to manually configure MPI support.
Solution:
make sure you've symlinked to your mpic++ instance in /usr/bin
e.g. if your mpic++ is in /opt/openmpi/bin/mpic++
sudo ln -s /opt/openmpi/bin/mpic++ /usr/bin/mpic++


Error (sinfo):
message.cc: In member function 'void Message::popFrontMemory(void*, size_t)':
message.cc:183: error: 'memory' was not declared in this scope
message.cc:193: error: 'boost' has not been declared
message.cc:193: error: expected primary-expression before 'char'
message.cc:193: error: expected `;' before 'char'
message.cc:196: error: 'newMemory' was not declared in this scope
message.cc:196: error: 'memory' was not declared in this scope
make[2]: *** [message.lo] Error 1
make[2]: Leaving directory `/state/partition1/home/me/tmp/sinfo-0.0.44/libmessage'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/state/partition1/home/me/tmp/sinfo-0.0.44/libmessage'
make: *** [all-recursive] Error 1
Solution:
You need to make sure that the libs are found -- either symlink manually between your build directory and /usr/lib, or use boostrap.sh --prefix=/usr. See above for how to do it.

Error (sinfo):
udpmessagereceiver.h:14: error: 'asio' has not been declared
udpmessagereceiver.h:14: error: ISO C++ forbids declaration of 'endpoint' with no type
udpmessagereceiver.h:14: error: expected ';' before 'sender_endpoint'
udpmessagereceiver.h:16: error: 'asio' has not been declared
udpmessagereceiver.h:16: error: ISO C++ forbids declaration of 'io_service' with no type
udpmessagereceiver.h:16: error: expected ';' before '&' token
udpmessagereceiver.h:17: error: 'asio' has not been declared
udpmessagereceiver.h:17: error: ISO C++ forbids declaration of 'socket' with no type
udpmessagereceiver.h:17: error: expected ';' before 'sock'
udpmessagereceiver.h:20: error: expected ',' or '...' before '::' token
udpmessagereceiver.h:20: error: ISO C++ forbids declaration of 'asio' with no type
udpmessagereceiver.h:23: error: 'asio' has not been declared
udpmessagereceiver.h:23: error: expected `)' before '&' token
udpmessagereceiver.cc:5: error: 'asio' has not been declared
udpmessagereceiver.cc:5: error: expected `)' before '&' token
make[1]: *** [udpmessagereceiver.lo] Error 1
make[1]: Leaving directory `/state/partition1/home/me/tmp/sinfo-0.0.44/libmessageio'
make: *** [all-recursive] Error 1

Solution: you've only got boost::asio installed, not the independent asio. See above for how to compile and install asio.

Error (sinfo):

/usr/bin/ld: cannot find -lboost_signals-mt
collect2: ld returned 1 exit status
make[2]: *** [sinfod] Error 1
make[2]: Leaving directory `/state/partition1/home/me/tmp/sinfo-0.0.44/sinfod'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/state/partition1/home/me/tmp/sinfo-0.0.44/sinfod'
make: *** [all-recursive] Error 1
Solution:
You need a symlink pointing form /usr/lib/libboost_signals-mt.so to /usr/lib/libboost_signals.so
ln -s /usr/lib/libboost_signals.so /usr/lib/libboost_signals-mt.so 

Error (sinfod):
sinfod --quiet --bcastaddress=192.168.1.255 gives nothing and sinfod exits silently immediately
sinfod -F gives
exception:open:address family not supported
Here's the relevant strace output:
[..]
 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 6
[..]
 socket(PF_INET6, SOCK_DGRAM, IPPROTO_UDP) = -1 EAFNOSUPPORT (Address family not supported by protocol)
futex(0x333a40d350, FUTEX_WAKE_PRIVATE, 2147483647) = 0
close(6)                                = 0
close(3)                                = 0
close(4)                                = 0
close(5)                                = 0
write(2, "Exception: ", 11)             = 11
write(2, "open: Address family not support"..., 46) = 46
write(2, "\n", 1)                       = 1
exit_group(0)                           = ?

Solution: enable ipv6 (see above)

28 February 2012

86. Building sinfo 0.0.45 on Debian Testing

I use sinfo to keep an eye on my cluster:
http://www.ant.uni-bremen.de/whomes/rinas/sinfo/#down

The debian repo version is 0.0.42-1
The latest version is sinfo 0.0.45

Here's the changelog since 0.0.42
sinfo 0.0.45- Tue, 13 Mar 2012 07:07:27 +0100corrected README compile hint for FreeBSDadded configure flag --disable-IPv6 to disable IPv6 supportsinfo 0.0.44 - Tue, 13 Dec 2011 18:32:33 +0100added reconnect for TCP connectionsadded LIBADD to make it --as-needed linkable tnx to T.Hardersinfo 0.0.43 - Thu, 01 Sep 2011 09:00:13 +0200fixed printing bug (integer underflow) when using sinfo -L or -W tnx to J.Erkkilae


There's little reason for compiling it yourself, but there's really no reason not to try either. I like debian and I like using apt-get to manage my system. Learning to be a bit more independent won't hurt though.
So here we go:

--START HERE --
wget http://www.ant.uni-bremen.de/whomes/rinas/sinfo/download/sinfo-0.0.45.tar.gz
tar -xvf sinfo-0.0.45.tar.gz 
cd sinfo-0.0.45/
sudo apt-get install libboost-dev libasio-dev libboost-signals-dev
./configure
(If your interface configuration has disabled IPv6 you must use ../configure --disable-IPv6 or sinfod will silently exit)
make 
sudo checkinstall


Done.

Start the sinfo daemon by
sudo sinfod --quiet --bcastaddress=192.168.1.255


Use
sinfo
to monitor

If you had sinfo installed before, autoremove it, then edit the left-behind /etc/init.d/sinfo script and change
/usr/sbin/sinfod
 to
 /usr/sbin/local/sinfod
Otherwise see http://verahill.blogspot.com.au/2012/02/debian-testing-wheezy-64-building_23.html for an example of writing your own init.d script.


To see all the cluster nodes running sinfo, just start
sinfo
If you don't see anything, then you've most likely not opened up your firewall -- you need to be able to listen to bcast.

Build errors:

Error:
In file included from message.cc:2:0:
message.h:5:34: fatal error: boost/shared_array.hpp: No such file or directory
compilation terminated.
make[2]: *** [message.lo] Error 1
make[2]: Leaving directory `/home/me/tmp/sinfo-0.0.45/libmessage'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/me/tmp/sinfo-0.0.45/libmessage'
make: *** [all-recursive] Error 1

Solution:
sudo apt-get install libboost-dev


Error:
In file included from udpmessagereceiver.cc:2:0:
udpmessagereceiver.h:4:20: fatal error: asio.hpp: No such file or directory
compilation terminated.
make[1]: *** [udpmessagereceiver.lo] Error 1
make[1]: Leaving directory `/home/me/tmp/sinfo-0.0.45/libmessageio'
make: *** [all-recursive] Error 1

Solution:
sudo apt-get install libasio-dev
which provides
/usr/include/asio.hpp
which is different from the asio.hpp included in libboost


Error:
/usr/bin/ld: cannot find -lboost_signals-mt
collect2: ld returned 1 exit status
make[2]: *** [sinfod] Error 1
make[2]: Leaving directory `/home/me/tmp/sinfo-0.0.45/sinfod'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/home/me/tmp/sinfo-0.0.45/sinfod'
make: *** [all-recursive] Error 1

Solution:
sudo apt-get install libboost-signals-dev

Links to this post:
http://ant.uni-bremen.de/whomes/rinas/sinfo/