Another release candidate - 4.0 RC3 - is now available on Sourceforge. There are a couple of outstanding bug reports related to alerts that I would like to get a grip on before calling this an official release.
The list of changes - see below - is again rather long. Most notably, hobbitd crashing because of a mis-setting of the MACHINE variable has been fixed, as well as the bbtest-net crashes that happened with the apache-test.
Also of note: Instaling Hobbit is now always done by running "make install". The old "setup" target no longer exists; "make install" will install everything, and even update your configuration files if new settings have been added. Expect quite a few updates when you upgrade from a previous version to 4.0-RC3, as I added all of the settings needed by the BB client package (so that extension scripts have all of the environment variables they expect).
Regards, Henrik
Changes from RC-2 -> RC-3
Configuration file changes:
The bb-services file format was changed slightly. Instead of "service foo" to define a service, it is now "[foo]". Existing files will be converted automatically by "make install".
The name of the "conn" column (for ping-tests) is used throughout Hobbit, and had to be set in multiple locations. Changed all of them to use the setting from the PINGCOLUMN environment variable, and added this to hobbitserver.cfg.
The --purple-conn option was dropped from hobbitd. It should be removed from hobbitlaunch.cfg.
The --ping=COLUMNNAME option for bbtest-net should not be used any more. "--ping" enables the ping tests, the name of the column is taken from the PINGCOLUMN variable.
The GRAPHS setting in hobbitserver.cfg no longer needs to have the simple TCP tests defined. These are automatically picked up from the bb-services file.
Bugfixes:
hobbitd no longer crashes, if the MACHINE name from hobbitserver.cfg is not listed in bb-hosts. Thanks to Anonymous for helping me track down this bug.
If hobbitd crashed, then hobbitlaunch would attempt to restart it immediately. Added a 5 second delay, so that there's time for the OS to clean up any open sockets, files etc that might prevent a restart from working.
The "disk" RRD handler could be confused by reports from a Unix server, and mistake it for a report from a Windows server. This caused the report to try and store data in an RRD file with an invalid filename, so no graph-data was being stored.
The "cpu" and "disk" RRD handlers were enhanced to support reports from the "filerstats2bb" script for monitoring NetApp systems. The disk-handler also supports the "inode" and "qtree" reports from the same script.
bb-services was overwritten by a "make install". This wiped out custom network test definitions.
bbnet would crash if you happened to define a "http" or "https" test instead of using a full URL.
bbnet was mis-calculating the size of the URL used for th apache-test. This could cause it to overflow a buffer and crash.
hobbitd would ignore the BBPORT setting and always default to using port 1984.
Portability problems on HP-UX 11 should be resolved. From reports it appears that building RRDtool on HP-UX 11 is somewhat of a challenge; however, the core library is all that Hobbit needs, so build-problems with the Perl modules can be ignored as far as Hobbit is concerned.
hobbitd_alert could not handle multiple recipients for scripts, and mistakenly assumed all recipients with a "@" were for e-mail recipients.
Alert messages no longer include the "<!-- flags:... ->" summary; this is for Hobbit internal use only.
"suse" and "mandrake" are recognized as aliases for "linux" in the RRD handler.
Improvements:
The info-pages now list the Hobbit alert configuration.
hobbitd_alert now has a "--trace=FILENAME" option. This causes it to log a complete trace of all messages received from hobbitd, and how they are handled and what alerts get sent out as a result. This should help in tracking down alert problems.
New FORMAT=PLAIN setting for alert recipients. This is the same as FORMAT=TEXT, except that the URL link to the status- page is left out of the message.
The "setup" target for make has been removed. "make install" will now do all of the work, and will also merge in any added settings to the hobbitserver.cfg, hobbitgraph.cfg, hobbitlaunch.cfg, columndoc.csv and bb-services files. The standard files in ~/server/web/ and ~/server/www/ are also updated, if a previous version of the standard file is found.
The graph included on a status view page can now be zoomed directly, without having to go over the "view all period graphs" page.
Color-names in hobbit-alerts.cfg are now case-insensitive.
If the "acknowledge alert" webpage is password-protected, the login-username is now included in the acknowledge message. This will also appear in the BB2 acknowledgement log display, and on the status page.
More tips added to the "Tips & Tricks" document: How to get temperature graphs with Fahrenheit, how to configure Apache to allow viewing of the CGI man-pages.
A native MD5 message-digest routine was added, so content- checks using digests will work even when Hobbit is built without OpenSSL support. The routine was taken from http://sourceforge.net/projects/libmd5-rfc/
bb-findhost CGI will let you search for IP-adresses.
The "--recentgifs" option to bbgen now has a parameter, so you can specify what the threshold is for a status to have changed "recently". The default is 24 hours.
On Tue, Feb 22, 2005 at 11:49:00PM +0100, Henrik Stoerner wrote:
Another release candidate - 4.0 RC3 - is now available on Sourceforge. There are a couple of outstanding bug reports related to alerts that I would like to get a grip on before calling this an official release.
One thing I forgot to mention: If you do run into problems with alerts not happening as you expect them to, please add the option "--trace=FILENAME" to the hobbitd_alert command in hobbitlaunch.cfg. This causes all alert-activity to be logged to the file you specify, and will make it much easier to figure out why the code acted the way it did.
Henrik
On Tue, Feb 22, 2005 at 11:49:00PM, Henrik Stoerner wrote:
Another release candidate - 4.0 RC3 - is now available on Sourceforge.
bb-infocolumn.c: In function generate_hobbit_alertinfo': bb-infocolumn.c:110: error: PATH_MAX' undeclared (first use in this
function)
bb-infocolumn.c:110: error: (Each undeclared identifier is reported only
once
bb-infocolumn.c:110: error: for each function it appears in.)
gmake[1]: *** [bb-infocolumn.o] Error 1
gmake[1]: Leaving directory `/usr/share/src/hobbit-4.0-RC3/bbdisplay'
gmake: *** [bbdisplay-build] Error 2
There are a couple of outstanding bug reports related to alerts that I would like to get a grip on before calling this an official release. [...]
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu "It is not the strongest of the species that survives, not the most intelligent, but the one most responsive to change." - Charles Darwin
On Wed, Feb 23, 2005 at 02:10:58AM -0500, Asif Iqbal wrote:
On Tue, Feb 22, 2005 at 11:49:00PM, Henrik Stoerner wrote:
Another release candidate - 4.0 RC3 - is now available on Sourceforge.
bb-infocolumn.c: In function
generate_hobbit_alertinfo': bb-infocolumn.c:110: error:PATH_MAX' undeclared (first use in this function)
Oh, that's a silly one. Just add
#include <limits.h>
next to all the other "#include..." lines near the top of that file.
Henrik
Hi all
I'm still having some issues with Hobbit 4.0 RC3 (installed from scratch, on a Gentoo Linux x86 up to date). The main problem is that I can't disable a host using maint.pl: it just does nothing, and I get this in my Apache error_log :
maint.pl: Use of uninitialized value in substitution (s///) at /BB/hobbit/cgi-secure/maint.pl line 550., referer: http://xx.xx.xx.xx/hobbit/
FYI, Perl was upgraded from perl-5.8.5-r4 to perl-5.8.5-r2, regarding 2 security alerts (CAN-2005-015{5,6}). I'll try to downgrade.
Changes from RC-2 -> RC-3
[...]
Improvements:
- The info-pages now list the Hobbit alert configuration.
With this paging rule :
HOST=foo SERVICE=* REPEAT=24h TIME=W:0900:1800 DURATION>5m SCRIPT /tmp/alert.sh FORMAT=TEXT
I get this in the "Recipient" case : "FORMAT=TEXT"
Shouldn't it be the script name ?
Another small problem : I'm running bbgen with the option " --infoupdate=300", and get this warning under the "bbgen" column :
Error output: Unknown option : --infoupdate=300
There seems to be missing someting in the sources :
$ find /tmp/hobbit-4.0-RC3 | xargs grep infoupdate ./docs/manpages/man1/bbgen.1.html:<DT>--infoupdate=N<DD> ./bbdisplay/bbgen.1:.IP "--infoupdate=N" ./bbdisplay/bbgen.c: printf(" --infoupdate=N : time between updates of INFO column pages in seconds\n");
- New FORMAT=PLAIN setting for alert recipients. This is the same as FORMAT=TEXT, except that the URL link to the status- page is left out of the message.
I'm still having a warning if "FORMAT=TEXT" is not specified in hobbit-alerts.cfg :
Ignoring SCRIPT with no recipient at line 1
- The "--recentgifs" option to bbgen now has a parameter, so you can specify what the threshold is for a status to have changed "recently". The default is 24 hours.
Thanks for a lot for this one :-)
Regards,
--
Frédéric Mangeant
Another issue : with this paging rule HOST=foo SERVICE=* EXSERVICE=procs REPEAT=24h TIME=W:0900:1800 DURATION>5m SCRIPT /tmp/alert.sh FORMAT=TEXT I got paged every 30 minutes with a red "disk" column, instead of 24 hours. The trace file contains this : 00014678 2005-02-23 16:22:23 Matching host:service:page 'foo:disk:supervision/hobbit' against rule line 1 00014678 2005-02-23 16:22:23 *** Match with 'HOST=foo SERVICE=* EXSERVICE=procs REPEAT=24h TIME=W:0900:1800 DURATION>5m SCRIPT /tmp/alert.sh FORMAT=TEXT COLOR=red,purple' *** 00014678 2005-02-23 16:22:23 Matching host:service:page 'foo:disk:supervision/hobbit' against rule line 1 00014678 2005-02-23 16:22:23 *** Match with 'HOST=foo SERVICE=* EXSERVICE=procs REPEAT=24h TIME=W:0900:1800 DURATION>5m SCRIPT /tmp/alert.sh FORMAT=TEXT COLOR=red,purple' *** 00000836 2005-02-23 16:22:23 send_alert foo:disk state Paging 00000836 2005-02-23 16:22:23 Matching host:service:page 'foo:disk:supervision/hobbit' against rule line 1 00000836 2005-02-23 16:22:23 *** Match with 'HOST=foo SERVICE=* EXSERVICE=procs REPEAT=24h TIME=W:0900:1800 DURATION>5m SCRIPT /tmp/alert.sh FORMAT=TEXT COLOR=red,purple' *** 00000836 2005-02-23 16:22:23 Matching host:service:page 'foo:disk:supervision/hobbit' against rule line 1 00000836 2005-02-23 16:22:23 *** Match with 'HOST=foo SERVICE=* EXSERVICE=procs REPEAT=24h TIME=W:0900:1800 DURATION>5m SCRIPT /tmp/alert.sh FORMAT=TEXT COLOR=red,purple' *** 00000836 2005-02-23 16:22:23 Script alert with command '/tmp/alert.sh' and recipient FORMAT=TEXT 00014678 2005-02-23 16:22:23 Matching host:service:page 'foo:disk:supervision/hobbit' against rule line 1 00014678 2005-02-23 16:22:23 *** Match with 'HOST=foo SERVICE=* EXSERVICE=procs REPEAT=24h TIME=W:0900:1800 DURATION>5m SCRIPT /tmp/alert.sh FORMAT=TEXT COLOR=red,purple' *** 00014678 2005-02-23 16:22:23 Matching host:service:page 'foo:disk:supervision/hobbit' against rule line 1 00014678 2005-02-23 16:22:23 *** Match with 'HOST=foo SERVICE=* EXSERVICE=procs REPEAT=24h TIME=W:0900:1800 DURATION>5m SCRIPT /tmp/alert.sh FORMAT=TEXT COLOR=red,purple' *** 00014678 2005-02-23 16:22:27 @@page foo:disk:supervision/hobbit=red -- Frédéric Mangeant
On Wed, Feb 23, 2005 at 05:46:47PM +0100, Frédéric Mangeant wrote:
Another issue : with this paging rule
HOST=foo SERVICE=* EXSERVICE=procs REPEAT=24h TIME=W:0900:1800 DURATION>5m SCRIPT /tmp/alert.sh FORMAT=TEXT
I got paged every 30 minutes with a red "disk" column, instead of 24 hours.
You cannot set a REPEAT setting on a rule, it goes on the recipient. And a SCRIPT recipient needs both the script name and a parameter. So your alert-config should be
HOST=foo SERVICE=* EXSERVICE=procs TIME=W:0900:1800 DURATION>5m SCRIPT /tmp/alert.sh somerecipient FORMAT=TEXT REPEAT=24h
Henrik
On Wed, Feb 23, 2005 at 04:18:39PM +0100, Frédéric Mangeant wrote:
The main problem is that I can't disable a host using maint.pl: it just does nothing, and I get this in my Apache error_log :
maint.pl: Use of uninitialized value in substitution (s///) at /BB/hobbit/cgi-secure/maint.pl line 550., referer: http://xx.xx.xx.xx/hobbit/
Others have reported the same, but it appears to be dependant on the Perl version or configuration that is used. However, since this particular bit of the maint.pl script is essentially "dead code" (it was a preparation for some new feature in the Big Brother bbd daemon which - as far as I know - was never implemented), I've ripped it out now and that should hopefully take care of this problem.
- The info-pages now list the Hobbit alert configuration.
With this paging rule :
HOST=foo SERVICE=* REPEAT=24h TIME=W:0900:1800 DURATION>5m SCRIPT /tmp/alert.sh FORMAT=TEXT
I get this in the "Recipient" case : "FORMAT=TEXT"
It's a bug in RC-3. Should be fixed now.
Another small problem : I'm running bbgen with the option " --infoupdate=300", and get this warning under the "bbgen" column :
Error output: Unknown option : --infoupdate=300
Both the --info and --info-update options are gone. I forgot to delete them from the man-page, thanks for noticing.
If you want the info-column pages updated more frequently, change the setting for the bb-infocolumn task in server/etc/hobbitlaunch.cfg
Henrik
participants (3)
-
frederic.mangeant@steria.com
-
henrik@hswn.dk
-
iqbala-hobbit@qwestip.net