Highlights of the 4.3.0 version
In another thread, someone asked about what new features are planned for version 4.3.0. I've summarized them below; they have all been implemented by now. Some of them have been contributed by others over the past year - I'm pleased to have finally gotten their patches merged.
There are some open bug-reports, and the plan now is to try and get those fixed. Once that is done I'll ask you all to start testing the beta-versions, and then a new release is hopefully available soon.
This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.
Major new features
- PAGE setting for alert- and client-configuration handles hosts on multiple pages, so any pagename can be used.
- Flap detection of statuses that change color rapidly. The status is kept at the most critical level until it stops flapping.
- Holiday support for alerts, including variable holidays (Easter etc)
- Split NCV support - graph data from NCV can be split into multiple RRD databases allowing for varying number of datasets.
- RRD database parameters are now configurable (i.e. number of datapoints stored, whether to store min/max values etc). Note that this only applies to newly created RRD files, not existing ones.
- Distributed worker modules allow sharing the load across multiple Hobbit servers
- RRD updates are now cached for up to 30 minutes before being written to disk. This makes the I/O load on large installations much lighter.
- Detection of statuses that are reported by multiple hosts
- Client backend-support for the z/OS and z/VSE clients by Rich Smirna
Display things
- Graph zooming now limits the lower/upper bounds of a graph (requires rrdtool 1.2.x)
- The trends page default data-period can be configured to something other than the default 48-hour view, and the user can select a different period on-the-fly.
- Hosts can be sorted automatically on the overview webpage with a "group-sorted" group definition.
- NOCOLUMNS setting in bb-hosts let you suppress certain columns on a per-host basis
- Host-comments are displayed as tool-tips, to save screen space.
Checks and graphs
- Network tests can use a specific source IP instead of the default
- The validity-period of network tests is configurable, instead of being fixed at the default 30-minute setting
- Client file checks can check for a symlink
- "trends" report for RRD handling allows generating custom-made RRD files
- Hobbit host- and status-counts are tracked in an RRD file
Miscellaneous
- NCV reports can handle color-icons before the name:value data
- hobbitlaunch tasks can be configured to run on certain hosts only
- Time-warp detection and warning
- Local unix-socket interface to Hobbit daemon
- hobbitd_capture can collect several statuses and hand off such a batch to an external command
- Support for SHA-224/256/384/512 digests
Regards, Henrik
On 7/21/07, Henrik Stoerner <henrik at hswn.dk> wrote:
In another thread, someone asked about what new features are planned for version 4.3.0. I've summarized them below; they have all been implemented by now. Some of them have been contributed by others over the past year - I'm pleased to have finally gotten their patches merged.
There are some open bug-reports, and the plan now is to try and get those fixed. Once that is done I'll ask you all to start testing the beta-versions, and then a new release is hopefully available soon.
This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.
Send a disable request thru email. Currently it only takes delay request
Major new features
- PAGE setting for alert- and client-configuration handles hosts on multiple pages, so any pagename can be used.
- Flap detection of statuses that change color rapidly. The status is kept at the most critical level until it stops flapping.
- Holiday support for alerts, including variable holidays (Easter etc)
- Split NCV support - graph data from NCV can be split into multiple RRD databases allowing for varying number of datasets.
- RRD database parameters are now configurable (i.e. number of datapoints stored, whether to store min/max values etc). Note that this only applies to newly created RRD files, not existing ones.
- Distributed worker modules allow sharing the load across multiple Hobbit servers
- RRD updates are now cached for up to 30 minutes before being written to disk. This makes the I/O load on large installations much lighter.
- Detection of statuses that are reported by multiple hosts
- Client backend-support for the z/OS and z/VSE clients by Rich Smirna
Display things
- Graph zooming now limits the lower/upper bounds of a graph (requires rrdtool 1.2.x)
- The trends page default data-period can be configured to something other than the default 48-hour view, and the user can select a different period on-the-fly.
- Hosts can be sorted automatically on the overview webpage with a "group-sorted" group definition.
- NOCOLUMNS setting in bb-hosts let you suppress certain columns on a per-host basis
- Host-comments are displayed as tool-tips, to save screen space.
Checks and graphs
- Network tests can use a specific source IP instead of the default
- The validity-period of network tests is configurable, instead of being fixed at the default 30-minute setting
- Client file checks can check for a symlink
- "trends" report for RRD handling allows generating custom-made RRD files
- Hobbit host- and status-counts are tracked in an RRD file
Miscellaneous
- NCV reports can handle color-icons before the name:value data
- hobbitlaunch tasks can be configured to run on certain hosts only
- Time-warp detection and warning
- Local unix-socket interface to Hobbit daemon
- hobbitd_capture can collect several statuses and hand off such a batch to an external command
- Support for SHA-224/256/384/512 digests
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
On 7/21/07, Asif Iqbal <vadud3 at gmail.com> wrote:
On 7/21/07, Henrik Stoerner <henrik at hswn.dk> wrote:
In another thread, someone asked about what new features are planned for version 4.3.0. I've summarized them below; they have all been implemented by now. Some of them have been contributed by others over the past year - I'm pleased to have finally gotten their patches merged.
There are some open bug-reports, and the plan now is to try and get those fixed. Once that is done I'll ask you all to start testing the beta-versions, and then a new release is hopefully available soon.
This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.
Monitor and RRD of memory and cpu usage for a process
[..stripped for brevity..]
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
On 7/21/07, Henrik Stoerner <henrik at hswn.dk> wrote:
In another thread, someone asked about what new features are planned for version 4.3.0. I've summarized them below; they have all been implemented by now. Some of them have been contributed by others over the past year - I'm pleased to have finally gotten their patches merged.
There are some open bug-reports, and the plan now is to try and get those fixed. Once that is done I'll ask you all to start testing the beta-versions, and then a new release is hopefully available soon.
This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.
- Display column only when it is red (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html)
- SNMP trap by default
- SNMP probe option builtin
- Process specific alert (http://www.hswn.dk/hobbiton/2005/11/msg00159.html)
- Comment TAG for DOWNTIME (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html)
- Add functionalities in `delay' (http://www.hswn.dk/hobbiton/2005/06/msg00272.html)
- CPU/Memory Usage per process (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html)
- Text based alert for `msgs'. Currently it shows as html in my email (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html)
Thanks again for such an excellent application and keeping it open!!
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
On Sat, Jul 21, 2007 at 07:16:12PM -0400, Asif Iqbal wrote:
- Display column only when it is red (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html)
I'll leave that for later. There will probably be an entire new version with just display things.
- SNMP trap by default
- SNMP probe option builtin
Too much for now. I need to dig into the Net-SNMP library API to do that.
- Process specific alert (http://www.hswn.dk/hobbiton/2005/11/msg00159.html)
Already in 4.2.0 via the GROUP definition in hobbit-clients.cfg and the corresponding rule in hobbit-alerts.cfg
- Comment TAG for DOWNTIME (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html)
Has been implemented for 4.3.0
- Add functionalities in `delay' (http://www.hswn.dk/hobbiton/2005/06/msg00272.html)
Haven't looked at that.
- CPU/Memory Usage per process (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html)
Probably impossible. Most "ps" implementations can report the current amount of cpu/memory a process uses, but that's a snapshot (ever noticed how "top" always has itself in the top list of cpu-using processes?). What's interesting is not how much cpu/memory a process uses exactly when the Hobbit client runs the "ps" command, but how much it has used on average since the last client run - similar to what "vmstat" reports for the system as a whole. I don't know of any way to get this data.
Another problem with this is identifying what a process is. A long-running daemon often forks child-processes that are short-lived; should we add their cpu-utilisation to that of the long-running process? If yes, then we have to monitor all processes that are started (so running once every N seconds is not sufficient); if no, then you won't spot the cpu hog because it was spawned as a child process.
- Text based alert for `msgs'. Currently it shows as html in my email (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html)
Easily done with an alert script.
Regards, Henrik
Well, we watch for the presence of processes today. It would be nice to be able to track cpu and size of "important" processes over time.
Another problem is detecting CPU hogs (sometimes things run away), another problem is detecting processes with memory leaks -- they just grow and grow and grow. How can Hobbit help?
GLH
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: Tuesday, July 24, 2007 3:31 PM To: hobbit at hswn.dk Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Sat, Jul 21, 2007 at 07:16:12PM -0400, Asif Iqbal wrote:
- Display column only when it is red (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html)
I'll leave that for later. There will probably be an entire new version with just display things.
- SNMP trap by default
- SNMP probe option builtin
Too much for now. I need to dig into the Net-SNMP library API to do that.
- Process specific alert (http://www.hswn.dk/hobbiton/2005/11/msg00159.html)
Already in 4.2.0 via the GROUP definition in hobbit-clients.cfg and the corresponding rule in hobbit-alerts.cfg
- Comment TAG for DOWNTIME (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html)
Has been implemented for 4.3.0
- Add functionalities in `delay' (http://www.hswn.dk/hobbiton/2005/06/msg00272.html)
Haven't looked at that.
- CPU/Memory Usage per process (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html)
Probably impossible. Most "ps" implementations can report the current amount of cpu/memory a process uses, but that's a snapshot (ever noticed how "top" always has itself in the top list of cpu-using processes?). What's interesting is not how much cpu/memory a process uses exactly when the Hobbit client runs the "ps" command, but how much it has used on average since the last client run - similar to what "vmstat" reports for the system as a whole. I don't know of any way to get this data.
Another problem with this is identifying what a process is. A long-running daemon often forks child-processes that are short-lived; should we add their cpu-utilisation to that of the long-running process? If yes, then we have to monitor all processes that are started (so running once every N seconds is not sufficient); if no, then you won't spot the cpu hog because it was spawned as a child process.
- Text based alert for `msgs'. Currently it shows as html in my email (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html)
Easily done with an alert script.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Hi Greg,
I needed to do this originally with BB to track a memory leak with HP OpenView's pmd process, when we used to use it.
#!/bin/sh
SCRIPTS IN THE BBHOME/ext DIRECTORY ARE ONLY RUN IF
THEY ARE DEFINED IN THE ENTRY FOR THE CURRENT HOST
LISTED IN THE ext/bb-bbexttab FILE.
BBPROG SHOULD JUST CONTAIN THE NAME OF THIS FILE
USEFUL WHEN YOU GET ENVIRONMENT DUMPS TO LOCATE
THE OFFENDING SCRIPT...
BBPROG=bb-pmd.sh; export BBPROG
TEST NAME: THIS WILL BECOME A COLUMN ON THE DISPLAY
IT SHOULD BE AS SHORT AS POSSIBLE TO SAVE SPACE...
NOTE YOU CAN ALSO CREATE A HELP FILE FOR YOUR TEST
WHICH SHOULD BE PUT IN www/help/$TEST.html. IT WILL
BE LINKED INTO THE DISPLAY AUTOMATICALLY.
TEST="pmd"
BBHOME CAN BE SET MANUALLY WHEN TESTING.
OTHERWISE IT SHOULD BE SET FROM THE BB ENVIRONMENT
#BBHOME=/opt/BB/bb19c ; export BBHOME # FOR TESTING
if test "$BBHOME" = "" then echo "BBHOME is not set... exiting" exit 1 fi
if test ! "$BBTMP" # GET DEFINITIONS IF NEEDED then # echo "*** LOADING BBDEF ***" . $BBHOME/etc/bbdef.sh # INCLUDE STANDARD DEFINITIONS fi
PMDMEM=/bin/ps -e -o vsz -o comm | grep " pmd" | awk '{printf "%d", $1/1024}'
if test "$PMDMEM" = ""
then
COLOR="clear"
else
COLOR="green"
fi
AT THIS POINT WE HAVE OUR RESULTS. NOW WE HAVE TO SEND IT TO
THE BBDISPLAY TO BE DISPLAYED...
MACHINE NAME MUST EITHER BE A REAL MACHINE NAME, OR
LOOK LIKE A REAL MACHINE (in the case of arbitrary measurements
like temperature). IF THE NAME YOU ARE USING DOESN'T EXIST
IN THE DNS THEN IT SHOULD BE LISTED IN THE bb-hosts FILE WITH noping,
PREFERABLY IN IT'S OWN GROUP...
NOTE THE COMMAS HERE - YOU NEED THEM!
MACHINE=echo $MACHINE | $SED 's/\./,/g' # HAS TO BE IN A,B,C FORM
THE FIRST LINE IS STATUS INFORMATION... STRUCTURE IMPORANT!
THE REST IS FREE-FORM - WHATEVER YOU'D LIKE TO SEND...
LINE="PMD Statistics. " SUMMARY=" PMD memory usage is $PMDMEM"
NOW USE THE BB COMMAND TO SEND THE DATA ACROSS
SEND IT TO BBDISPLAY
$BB $BBDISP "status $MACHINE.$TEST $COLOR date $LINE $SUMMARY MB"
-----Original Message----- From: Hubbard, Greg L [mailto:greg.hubbard at eds.com] Sent: Tuesday, July 24, 2007 4:44 PM To: hobbit at hswn.dk Subject: RE: [hobbit] Highlights of the 4.3.0 version
Well, we watch for the presence of processes today. It would be nice to be able to track cpu and size of "important" processes over time.
Another problem is detecting CPU hogs (sometimes things run away), another problem is detecting processes with memory leaks -- they just grow and grow and grow. How can Hobbit help?
GLH
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: Tuesday, July 24, 2007 3:31 PM To: hobbit at hswn.dk Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Sat, Jul 21, 2007 at 07:16:12PM -0400, Asif Iqbal wrote:
- Display column only when it is red (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html)
I'll leave that for later. There will probably be an entire new version with just display things.
- SNMP trap by default
- SNMP probe option builtin
Too much for now. I need to dig into the Net-SNMP library API to do that.
- Process specific alert (http://www.hswn.dk/hobbiton/2005/11/msg00159.html)
Already in 4.2.0 via the GROUP definition in hobbit-clients.cfg and the corresponding rule in hobbit-alerts.cfg
- Comment TAG for DOWNTIME (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html)
Has been implemented for 4.3.0
- Add functionalities in `delay' (http://www.hswn.dk/hobbiton/2005/06/msg00272.html)
Haven't looked at that.
- CPU/Memory Usage per process (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html)
Probably impossible. Most "ps" implementations can report the current amount of cpu/memory a process uses, but that's a snapshot (ever noticed how "top" always has itself in the top list of cpu-using processes?). What's interesting is not how much cpu/memory a process uses exactly when the Hobbit client runs the "ps" command, but how much it has used on average since the last client run - similar to what "vmstat" reports for the system as a whole. I don't know of any way to get this data.
Another problem with this is identifying what a process is. A long-running daemon often forks child-processes that are short-lived; should we add their cpu-utilisation to that of the long-running process? If yes, then we have to monitor all processes that are started (so running once every N seconds is not sufficient); if no, then you won't spot the cpu hog because it was spawned as a child process.
- Text based alert for `msgs'. Currently it shows as html in my email (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html)
Easily done with an alert script.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Thanks!
-----Original Message----- From: shea_greg at emc.com [mailto:shea_greg at emc.com] Sent: Tuesday, July 24, 2007 3:56 PM To: hobbit at hswn.dk Cc: shea_greg at emc.com Subject: RE: [hobbit] Highlights of the 4.3.0 version
Hi Greg,
I needed to do this originally with BB to track a memory leak with HP OpenView's pmd process, when we used to use it.
#!/bin/sh
SCRIPTS IN THE BBHOME/ext DIRECTORY ARE ONLY RUN IF # THEY ARE DEFINED
IN THE ENTRY FOR THE CURRENT HOST # LISTED IN THE ext/bb-bbexttab FILE.
BBPROG SHOULD JUST CONTAIN THE NAME OF THIS FILE # USEFUL WHEN YOU GET
ENVIRONMENT DUMPS TO LOCATE # THE OFFENDING SCRIPT...
BBPROG=bb-pmd.sh; export BBPROG
TEST NAME: THIS WILL BECOME A COLUMN ON THE DISPLAY # IT SHOULD BE AS
SHORT AS POSSIBLE TO SAVE SPACE...
NOTE YOU CAN ALSO CREATE A HELP FILE FOR YOUR TEST # WHICH SHOULD BE
PUT IN www/help/$TEST.html. IT WILL # BE LINKED INTO THE DISPLAY AUTOMATICALLY.
TEST="pmd"
BBHOME CAN BE SET MANUALLY WHEN TESTING.
OTHERWISE IT SHOULD BE SET FROM THE BB ENVIRONMENT
#BBHOME=/opt/BB/bb19c ; export BBHOME # FOR TESTING
if test "$BBHOME" = "" then echo "BBHOME is not set... exiting" exit 1 fi
if test ! "$BBTMP" # GET DEFINITIONS IF NEEDED then # echo "*** LOADING BBDEF ***" . $BBHOME/etc/bbdef.sh # INCLUDE STANDARD DEFINITIONS fi
PMDMEM=/bin/ps -e -o vsz -o comm | grep " pmd" | awk '{printf "%d", $1/1024}' if test "$PMDMEM" = ""
then
COLOR="clear"
else
COLOR="green"
fi
AT THIS POINT WE HAVE OUR RESULTS. NOW WE HAVE TO SEND IT TO # THE
BBDISPLAY TO BE DISPLAYED...
MACHINE NAME MUST EITHER BE A REAL MACHINE NAME, OR # LOOK LIKE A REAL
MACHINE (in the case of arbitrary measurements # like temperature). IF THE NAME YOU ARE USING DOESN'T EXIST # IN THE DNS THEN IT SHOULD BE LISTED IN THE bb-hosts FILE WITH noping, # PREFERABLY IN IT'S OWN GROUP...
NOTE THE COMMAS HERE - YOU NEED THEM!
MACHINE=echo $MACHINE | $SED 's/\./,/g' # HAS TO BE IN A,B,C FORM
THE FIRST LINE IS STATUS INFORMATION... STRUCTURE IMPORANT!
THE REST IS FREE-FORM - WHATEVER YOU'D LIKE TO SEND...
LINE="PMD Statistics. " SUMMARY=" PMD memory usage is $PMDMEM"
NOW USE THE BB COMMAND TO SEND THE DATA ACROSS # SEND IT TO BBDISPLAY
$BB $BBDISP "status $MACHINE.$TEST $COLOR date $LINE $SUMMARY MB"
-----Original Message----- From: Hubbard, Greg L [mailto:greg.hubbard at eds.com] Sent: Tuesday, July 24, 2007 4:44 PM To: hobbit at hswn.dk Subject: RE: [hobbit] Highlights of the 4.3.0 version
Well, we watch for the presence of processes today. It would be nice to be able to track cpu and size of "important" processes over time.
Another problem is detecting CPU hogs (sometimes things run away), another problem is detecting processes with memory leaks -- they just grow and grow and grow. How can Hobbit help?
GLH
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: Tuesday, July 24, 2007 3:31 PM To: hobbit at hswn.dk Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Sat, Jul 21, 2007 at 07:16:12PM -0400, Asif Iqbal wrote:
- Display column only when it is red (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html)
I'll leave that for later. There will probably be an entire new version with just display things.
- SNMP trap by default
- SNMP probe option builtin
Too much for now. I need to dig into the Net-SNMP library API to do that.
- Process specific alert (http://www.hswn.dk/hobbiton/2005/11/msg00159.html)
Already in 4.2.0 via the GROUP definition in hobbit-clients.cfg and the corresponding rule in hobbit-alerts.cfg
- Comment TAG for DOWNTIME (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html)
Has been implemented for 4.3.0
- Add functionalities in `delay' (http://www.hswn.dk/hobbiton/2005/06/msg00272.html)
Haven't looked at that.
- CPU/Memory Usage per process (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html)
Probably impossible. Most "ps" implementations can report the current amount of cpu/memory a process uses, but that's a snapshot (ever noticed how "top" always has itself in the top list of cpu-using processes?). What's interesting is not how much cpu/memory a process uses exactly when the Hobbit client runs the "ps" command, but how much it has used on average since the last client run - similar to what "vmstat" reports for the system as a whole. I don't know of any way to get this data.
Another problem with this is identifying what a process is. A long-running daemon often forks child-processes that are short-lived; should we add their cpu-utilisation to that of the long-running process? If yes, then we have to monitor all processes that are started (so running once every N seconds is not sufficient); if no, then you won't spot the cpu hog because it was spawned as a child process.
- Text based alert for `msgs'. Currently it shows as html in my email (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html)
Easily done with an alert script.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
I've a problem with bbhostgrep of the latest snapshots ... It seems that he cannot get the data from the bb-hosts file...
Just to do an example... I've a host defined in the bb-hosts file as:
0.0.0.0 ITROMFS10 # WIN:* netapp
Now if i do: bbhostgrep netapp i get: sh-3.1$ bbhostgrep netapp 2007-07-24 23:13:19 Cannot load bb-hosts, or file is empty
bbhostshow works correctly
Francesco
On Tue, Jul 24, 2007 at 11:21:15PM +0200, Francesco Duranti wrote:
I've a problem with bbhostgrep of the latest snapshots ... It seems that he cannot get the data from the bb-hosts file...
The current snapshot has this fixed.
Regards, Henrik
On 7/24/07, Henrik Stoerner <henrik at hswn.dk> wrote:
On Sat, Jul 21, 2007 at 07:16:12PM -0400, Asif Iqbal wrote:
- Display column only when it is red (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html)
I'll leave that for later. There will probably be an entire new version with just display things.
- SNMP trap by default
- SNMP probe option builtin
Too much for now. I need to dig into the Net-SNMP library API to do that.
- Process specific alert (http://www.hswn.dk/hobbiton/2005/11/msg00159.html)
Already in 4.2.0 via the GROUP definition in hobbit-clients.cfg and the corresponding rule in hobbit-alerts.cfg
- Comment TAG for DOWNTIME (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html)
Has been implemented for 4.3.0
- Add functionalities in `delay' (http://www.hswn.dk/hobbiton/2005/06/msg00272.html)
Haven't looked at that.
- CPU/Memory Usage per process (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html)
Probably impossible. Most "ps" implementations can report the current amount of cpu/memory a process uses, but that's a snapshot (ever noticed how "top" always has itself in the top list of cpu-using processes?). What's interesting is not how much cpu/memory a process uses exactly when the Hobbit client runs the "ps" command, but how much it has used on average since the last client run - similar to what "vmstat" reports for the system as a whole. I don't know of any way to get this data.
Well in my `hobbit-clients.cfg' there is already an entry like this.
PROC "%hobbitd.*" TRACK=hobbitd
It already counts the total number of %hobbitd and label it as hobbitd. How about let it count the total amount of rss and pcpu as well for that process and just create two more rrds?
It won't be really inaccurate because it gives you a graphical representation of what the `ps' is telling you. Plus it could be GAUGE type data I guess.
Atleast it will give you some trend of how a process has been behaving. Even
though it may not do the pmap -x calculation but it sure will give you pointing fingures to some heavy processes
I bet you lot of hobbit community members would like to see ps graphs builtin to hobbit app
Another problem with this is identifying what a process is. A
long-running daemon often forks child-processes that are short-lived; should we add their cpu-utilisation to that of the long-running process? If yes, then we have to monitor all processes that are started (so running once every N seconds is not sufficient); if no, then you won't spot the cpu hog because it was spawned as a child process.
- Text based alert for `msgs'. Currently it shows as html in my email (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html)
Easily done with an alert script.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Thanks for the feedback to all of my feature requests. It is very kind of you.
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
On 7/24/07, Henrik Stoerner <henrik at hswn.dk> wrote:
On Sat, Jul 21, 2007 at 07:16:12PM -0400, Asif Iqbal wrote:
- Display column only when it is red (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html)
I'll leave that for later. There will probably be an entire new version with just display things.
- SNMP trap by default
- SNMP probe option builtin
Too much for now. I need to dig into the Net-SNMP library API to do that.
- Process specific alert (http://www.hswn.dk/hobbiton/2005/11/msg00159.html)
Already in 4.2.0 via the GROUP definition in hobbit-clients.cfg and the corresponding rule in hobbit-alerts.cfg
- Comment TAG for DOWNTIME (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html)
Has been implemented for 4.3.0
- Add functionalities in `delay' (http://www.hswn.dk/hobbiton/2005/06/msg00272.html)
Haven't looked at that.
- CPU/Memory Usage per process (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html)
Probably impossible. Most "ps" implementations can report the current amount of cpu/memory a process uses, but that's a snapshot (ever noticed how "top" always has itself in the top list of cpu-using processes?). What's interesting is not how much cpu/memory a process uses exactly when the Hobbit client runs the "ps" command, but how much it has used on average since the last client run - similar to what "vmstat" reports for the system as a whole. I don't know of any way to get this data.
Another problem with this is identifying what a process is. A long-running daemon often forks child-processes that are short-lived; should we add their cpu-utilisation to that of the long-running process? If yes, then we have to monitor all processes that are started (so running once every N seconds is not sufficient); if no, then you won't spot the cpu hog because it was spawned as a child process.
- Text based alert for `msgs'. Currently it shows as html in my email (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html)
Easily done with an alert script.
Well all my messages show up in html format. Wouldn't it be nice to generate the email, or have a choice to generate email, as text type instead of html type.
Also this email may suggest that text based email alert is possible.
http://www.hobbitmon.com/hobbiton/2005/10/msg00382.html
However, I might be misreading that email.
Again, please understand this is still just a low priority feature request.
Until then I will just explore the script idea that you suggested.
Appreciate all your work really!
Regards,
Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
On 7/21/07, Henrik Stoerner <henrik at hswn.dk> wrote:
In another thread, someone asked about what new features are planned for version 4.3.0. I've summarized them below;
Great to see the summary, these features look great. I'd like to request more RRDs and reports about the monitoring system and the servers/services monitored. For example:
I think the following could be "gauge" metrics:
Number of devices monitored Number of services monitored Number of host.service in green state Number of host.service in yellow state Number of host.service in red state Number of host.service in XXX state
I am thinking these could be done by creating counters within hobbit (since boot):
Number of state changes Number of state changes per server Number of state changes per service Number of notifications sent
I think the above metrics could help create reports over time periods for review to help get to "management by facts" vs. "management by feeling." Most admins that pay attention to their install will "know", but its different when you can "prove." Plus, when improvements are made, it's nice to see it.
I am also thinking we could try and apply some Six Sigma terminology and methodology to hobbit which may have value. Six Sigma keys on statistics and defects. Six Sigma refers to having production quality such that you only see 3.4 defects per million. Granted we are not "producing" a physical item, but I am thinking that a defect could be considered a purple/yellow/red state. With counters I suggested above, we could to apply various statistical measures (control charts, pareto charts, etc.) and see what makes sense or has value for monitoring.
The goal is to improve consistency and reduce variance.
If you like, I could draft up some graphs and reports I'd like to see. My above description might be hard to visualize. I definitely think hobbit could benefit from internal counters, similarly to how on OS keeps tracks of context switches and the like.
Scott
On Sat, Jul 21, 2007 at 09:34:11PM -0400, Scott Walters wrote:
Great to see the summary, these features look great. I'd like to request more RRDs and reports about the monitoring system and the servers/services monitored. For example:
I think the following could be "gauge" metrics:
Number of devices monitored Number of services monitored Number of host.service in green state Number of host.service in yellow state Number of host.service in red state Number of host.service in XXX state
You mean like this:
Statistics:
Hosts : 4321
Pages : 286
Status messages : 22331
- Red : 907 ( 4.06 %)
- Red (non-propagating) : 809 ( 3.62 %)
- Yellow : 353 ( 1.58 %)
- Yellow (non-propagating) : 210 ( 0.94 %)
- Clear : 1970 ( 8.82 %)
- Green : 17052 (76.36 %)
- Purple : 452 ( 2.02 %)
- Blue : 578 ( 2.59 %)
The first three are from the current "bbgen --report" status message; I've added the breakdown of the colors now. Will put these into an RRD for tracking trends.
I am thinking these could be done by creating counters within hobbit (since boot):
Number of state changes Number of state changes per server Number of state changes per service Number of notifications sent
The state changes can be calculated from the history logs. This is preferable, I think, because that way it won't get reset if the Hobbit server is restarted.
Notifications - it would make sense to have the alert module provide some statistics that we could put into a trend graph.
If you like, I could draft up some graphs and reports I'd like to see. My above description might be hard to visualize. I definitely think hobbit could benefit from internal counters, similarly to how on OS keeps tracks of context switches and the like.
Please do. The graphs I've created about the Hobbit "internals" have been mostly for my own use as debugging / performance evaluation data. If we can provide some data that is interesting to management, that would be a good thing.
Regards, Henrik
Henrik Stoerner wrote :
- Red : 907 ( 4.06 %) - Red (non-propagating) : 809 ( 3.62 %) - Yellow : 353 ( 1.58 %) - Yellow (non-propagating) : 210 ( 0.94 %)
Hey,
what a nice hook to tell about a bug in nopropred/nopropyellow: I _often_ (but not always) get a red status with nopropred on the bb2 page. Full report here:
[Jun, 21th] (http://www.hswn.dk/hobbiton/2007/06/msg00311.html)
-- Charles Goyard - charles.goyard at orange-ftgroup.com - (+33) 1 45 38 01 31 Orange Business Services - online multimedia // ingénierie
On Mon, Jul 23, 2007 at 08:59:54AM +0200, Charles Goyard wrote:
what a nice hook to tell about a bug in nopropred/nopropyellow: I _often_ (but not always) get a red status with nopropred on the bb2 page. Full report here:
[Jun, 21th] (http://www.hswn.dk/hobbiton/2007/06/msg00311.html)
noprop's don't affect the bb2 page - they only control if a status affects the color of the "main page" that the status is on. If you want to remove these from the BB2 page, run bbgen with "--bb2-ignorecolumns=procs_master,is_master".
Regards, Henrik
Great to see author of larrd participating hobbit discussion. see below for my comments.
From: "Scott Walters" <scott at PacketPushers.com> Reply-To: hobbit at hswn.dk To: hobbit at hswn.dk Subject: Re: [hobbit] Highlights of the 4.3.0 version Date: Sat, 21 Jul 2007 21:34:11 -0400
On 7/21/07, Henrik Stoerner <henrik at hswn.dk> wrote:
In another thread, someone asked about what new features are planned for version 4.3.0. I've summarized them below;
Great to see the summary, these features look great. I'd like to request more RRDs and reports about the monitoring system and the servers/services monitored. For example:
I think the following could be "gauge" metrics:
Number of devices monitored Number of services monitored Number of host.service in green state Number of host.service in yellow state Number of host.service in red state Number of host.service in XXX state
I am thinking these could be done by creating counters within hobbit (since boot):
Number of state changes Number of state changes per server Number of state changes per service Number of notifications sent
I think the above metrics could help create reports over time periods for review to help get to "management by facts" vs. "management by feeling." Most admins that pay attention to their install will "know", but its different when you can "prove." Plus, when improvements are made, it's nice to see it.
Providing OS type and version metrics also, this will give us a clear view of how many vendor unsupported OS version(ex. solaris 2.5.1,2.6,2.7, hpux 9,hpux 10.20 etc) are still in an IT system.
Henrik showed me the command on this list last time I asked but it will be good if this can be done from hobbit server.
I am also thinking we could try and apply some Six Sigma terminology and methodology to hobbit which may have value. Six Sigma keys on statistics and defects. Six Sigma refers to having production quality such that you only see 3.4 defects per million. Granted we are not "producing" a physical item, but I am thinking that a defect could be considered a purple/yellow/red state. With counters I suggested above, we could to apply various statistical measures (control charts, pareto charts, etc.) and see what makes sense or has value for monitoring.
In Six Sigma, the availability is formated with 5 Nines(99.999), There is some patches floating around to make HB's Availability report showing 5 Nines format This is a baby step but got asked by management why the bb/hb report is one digit short of nines.
Associate Hobbit more with Six Sigma is definitely a good thing. Connecting Hobbit with ITIL is even better.
tj
The goal is to improve consistency and reduce variance.
If you like, I could draft up some graphs and reports I'd like to see. My above description might be hard to visualize. I definitely think hobbit could benefit from internal counters, similarly to how on OS keeps tracks of context switches and the like.
Scott
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
http://imagine-windowslive.com/hotmail/?locale=en-us&ocid=TXT_TAGHM_migratio...
If you like, I could draft up some graphs and reports I'd like to see. My above description might be hard to visualize.
Henrik, you're right about using the histories for reports. That data keeps its integrity unlike the RRD averages, much better for reports.
For a given input period (Last 7 days, June 2007, etc.)
Servers with the most state changes, sorted by highest to lowest (Maybe just Top 10). Clicking on server would generate list of state changes. "Look Bob, your server is not stable you need to get your developers under control!"
Services with the most state changes, sorted by highest to lowest (Maybe just Top 10). Clicking on service would generate list of the state changes for that period. "PHB, the web group is performing way too many undocumented code changes."
Red events with longest durations (for events still open, use start time to NOW as duration)
Yellow events with longest durations (for events still open, use start time to NOW as duration)
All ping/fping/conn events.
You could piece-meal some of those from the eventlog report, but I'd prefer a single page that showed them all. For weekly, quarterly meetings, turnover, etc.
Scott
On 7/23/07, Scott Walters <scott at packetpushers.com> wrote:
- Services with the most state changes, sorted by highest to lowest (Maybe just Top 10). Clicking on service would generate list of the state changes for that period. "PHB, the web group is performing way too many undocumented code changes."
Heh, that would be useful. I've got a perl script using SOAP to get BigIP pool status and some joker has transferred some machines between BigIPs without removing the old definitions. So, there's a bunch of systems/ports that flip/flop between enable & disable. Whether they're red or green depends on which report comes in last.
Maybe I can persuade the load balancer guys to actually remove the duplicate definitions.
Ralph Mitchell
On Tue, Jul 24, 2007 at 09:18:49AM -0500, Ralph Mitchell wrote:
On 7/23/07, Scott Walters <scott at packetpushers.com> wrote:
- Services with the most state changes, sorted by highest to lowest (Maybe just Top 10). Clicking on service would generate list of the state changes for that period. "PHB, the web group is performing way too many undocumented code changes."
Heh, that would be useful. I've got a perl script using SOAP to get BigIP pool status and some joker has transferred some machines between BigIPs without removing the old definitions. So, there's a bunch of systems/ports that flip/flop between enable & disable. Whether they're red or green depends on which report comes in last.
That should actually be caught by another 4.3.0 feature: Flap detection. If a status changes more than 10 times in 10 minutes, Hobbit deems it "flapping" and stops logging status changes - instead, it fixes the status at the most critical level reported.
Any hosts flapping are reported on the "hobbitd" status display.
Regards, Henrik
On 7/24/07, Henrik Stoerner <henrik at hswn.dk> wrote:
On Tue, Jul 24, 2007 at 09:18:49AM -0500, Ralph Mitchell wrote:
On 7/23/07, Scott Walters <scott at packetpushers.com> wrote:
- Services with the most state changes, sorted by highest to lowest (Maybe just Top 10). Clicking on service would generate list of the state changes for that period. "PHB, the web group is performing way too many undocumented code changes."
Heh, that would be useful. I've got a perl script using SOAP to get BigIP pool status and some joker has transferred some machines between BigIPs without removing the old definitions. So, there's a bunch of systems/ports that flip/flop between enable & disable. Whether they're red or green depends on which report comes in last.
That should actually be caught by another 4.3.0 feature: Flap detection. If a status changes more than 10 times in 10 minutes, Hobbit deems it "flapping" and stops logging status changes - instead, it fixes the status at the most critical level reported.
Unfortunately that's not going to affect my particular checks. Right now I have a Hobbit client kicking off the test on a 5 minute interval, so it goes off at time T, T+5min, T+10min, etc. The duplicated servers are only on 2 BigIPs, so they flip/flop over and back at time T, T+5, T+10. At most there will be 6 changes in a 10 minute period.
Could that 10-times-in-10-minutes be made into a variable?? Maybe a default value in the hobbitserver.cfg with an override in bb-hosts, though I hate to add yet another inch to the width of that file...
Actually, even flap detection isn't going to help my situation - the reports are going to be red for the BigIP where the server/port is disabled and green/red for the BigIP that *really* owns the server, so flap detection would show red anyway. All the time. I really need to get the duplicates removed.
Ralph Mitchell
On Mon, Jul 23, 2007 at 09:44:11PM -0400, Scott Walters wrote:
For a given input period (Last 7 days, June 2007, etc.)
Servers with the most state changes, sorted by highest to lowest (Maybe just Top 10). Clicking on server would generate list of state changes. "Look Bob, your server is not stable you need to get your developers under control!"
Services with the most state changes, sorted by highest to lowest (Maybe just Top 10). Clicking on service would generate list of the state changes for that period. "PHB, the web group is performing way too many undocumented code changes."
I've whipped up a very rough implementation as part of the eventlog report on the Hobbit demo site. Could you try generating a report at http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh and let me know if the data at the top is something in the right direction?
The nice thing about making it an add-on or variant of the eventlog report is that there's already all of the nice filtering for hosts, pages, time-periods etc in place, plus the "allevents" logfile parsing is also done.
Regards, Henrik
On 7/24/07, Henrik Stoerner <henrik at hswn.dk> wrote:
I've whipped up a very rough implementation as part of the eventlog report on the Hobbit demo site. Could you try generating a report at http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh and let me know if the data at the top is something in the right direction?
Bingo. But since that was so easy, here are a few more:
So voodoo.hswn.dk has had 12 state changes . . . what were they? It would be nice If the server name and service name could be HTML links which would generate a report of the state changes for the specified server/service over the given period.
Also, please show the total. If there are more then 10 hosts/services use an "Other" at the end of the list. I love seeing single hosts on 100+ node installs with 25% of activity. You know where to focus.
And I would imagine the "Top X" where X is configurable will be requested.
And print the report period on the page so you know what you are looking at.
The nice thing about making it an add-on or variant of the eventlog report is that there's already all of the nice filtering for hosts, pages, time-periods etc in place, plus the "allevents" logfile parsing is also done.
We'll keep requesting features until it gets hard ;)
Scott Walters -PacketPusher
Hi Henrik,
New Feature Request.-
I would like to add some network news on the pages. Suppose we found that some server has bad disk and some one will fix it later so here i want add the info that this issue has taken care.
On 7/25/07, Scott Walters <scott at packetpushers.com> wrote:
On 7/24/07, Henrik Stoerner <henrik at hswn.dk> wrote:
I've whipped up a very rough implementation as part of the eventlog report on the Hobbit demo site. Could you try generating a report at http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh and let me know if the data at the top is something in the right direction?
Bingo. But since that was so easy, here are a few more:
So voodoo.hswn.dk has had 12 state changes . . . what were they? It would be nice If the server name and service name could be HTML links which would generate a report of the state changes for the specified server/service over the given period.
Also, please show the total. If there are more then 10 hosts/services use an "Other" at the end of the list. I love seeing single hosts on 100+ node installs with 25% of activity. You know where to focus.
And I would imagine the "Top X" where X is configurable will be requested.
And print the report period on the page so you know what you are looking at.
The nice thing about making it an add-on or variant of the eventlog report is that there's already all of the nice filtering for hosts, pages, time-periods etc in place, plus the "allevents" logfile parsing is also done.
We'll keep requesting features until it gets hard ;)
Scott Walters -PacketPusher
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Thanks Sabeer MZ
On Wed, Jul 25, 2007 at 11:26:33AM +0530, Sabeer MZ wrote:
New Feature Request.-
I would like to add some network news on the pages. Suppose we found that some server has bad disk and some one will fix it later so here i want add the info that this issue has taken care.
Several possibilities already:
- Ack the red/yellow statuses you have, and put this information in the acknowledgement text.
- Disable the server and provide the information in the disable text.
- Create a host "notes" file with the information.
I'd use 1) or 2). I don't see the need for a fourth way of doing this.
Regards, Henrik
Henrik Stoerner wrote :
On Wed, Jul 25, 2007 at 11:26:33AM +0530, Sabeer MZ wrote:
New Feature Request.-
I would like to add some network news on the pages. Suppose we found that some server has bad disk and some one will fix it later so here i want add the info that this issue has taken care.
Several possibilities already:
- Ack the red/yellow statuses you have, and put this information in the acknowledgement text.
- Disable the server and provide the information in the disable text.
- Create a host "notes" file with the information.
Wasn't there a bb_bulletin feature too ?
-- Charles Goyard - charles.goyard at orange-ftgroup.com - (+33) 1 45 38 01 31 Orange Business Services - online multimedia // ingénierie
On Wed, Jul 25, 2007 at 11:28:00AM +0200, Charles Goyard wrote:
Henrik Stoerner wrote :
On Wed, Jul 25, 2007 at 11:26:33AM +0530, Sabeer MZ wrote:
New Feature Request.-
I would like to add some network news on the pages. Suppose we found that some server has bad disk and some one will fix it later so here i want add the info that this issue has taken care.
Several possibilities already:
- Ack the red/yellow statuses you have, and put this information in the acknowledgement text.
- Disable the server and provide the information in the disable text.
- Create a host "notes" file with the information.
Wasn't there a bb_bulletin feature too ?
~hobbit/server/web/bulletin_header and _footer, yes. But these show up on all pages, I think Sabeer wanted something specifically for a single status page.
Regards, Henrik
Many thanks. I ll check it out...
On 7/25/07, Henrik Stoerner <henrik at hswn.dk> wrote:
On Wed, Jul 25, 2007 at 11:26:33AM +0530, Sabeer MZ wrote:
New Feature Request.-
I would like to add some network news on the pages. Suppose we found that some server has bad disk and some one will fix it later so here i want add the info that this issue has taken care.
Several possibilities already:
- Ack the red/yellow statuses you have, and put this information in the acknowledgement text.
- Disable the server and provide the information in the disable text.
- Create a host "notes" file with the information.
I'd use 1) or 2). I don't see the need for a fourth way of doing this.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Thanks Sabeer MZ
Hi All,
I've been looking through the archives at the sample server side module Henrick showed us (http://www.hswn.dk/hobbiton/2007/01/msg00487.html) I'm just curious if anyone knows how to send data from the client side to the server the same way the standard hobbit client tests do, looking at bb I'm guessing either bb data or bb client, but when I run bb <hobbitIP> "client <hostname>.<os>" nothing happens and the man pages don't mention how to actually send the data etc.
Any help appreciated,
Thanks,
Jason.
On 7/31/07, Jones, Jason (Altrincham) <JasonAS_Jones at mentor.com> wrote:
I've been looking through the archives at the sample server side module Henrick showed us (http://www.hswn.dk/hobbiton/2007/01/msg00487.html) I'm just curious if anyone knows how to send data from the client side to the server the same way the standard hobbit client tests do, looking at bb I'm guessing either bb data or bb client, but when I run bb <hobbitIP> "client <hostname>.<os>" nothing happens and the man pages don't mention how to actually send the data etc.
I have a Hobbit client install running a BigIP check. In client/etc/clientlaunch.cfg:
[bigip-v4]
ENVFILE $HOBBITCLIENTHOME/etc/hobbitclient.cfg
CMD $HOBBITCLIENTHOME/ext/bigip/bigip3.sh
LOGFILE $HOBBITCLIENTHOME/logs/hobbitclient.log
INTERVAL 5m
After doing what it needs to do to get the status, the script sends off a status message to the server like this:
MACHINE=`echo $NAME | sed -e 's/\./,/g'`
MESSAGE="status $MACHINE.$TEST $COLOR `date`<P><font size=+2>The
$BIGIP BigIP says: $NAME $TEST is $STATE</font>" $BB $BBDISP $MESSAGE
Is that what you're looking for??
Ralph Mitchell
Not really, that sends the predetermined colours, what I was thinking is more sending the output of command x and then have hobbit generate the webpage.
Any ideas? Jason.
-----Original Message----- From: Ralph Mitchell [mailto:ralphmitchell at gmail.com] Sent: 31 July 2007 11:32 To: hobbit at hswn.dk Subject: Re: [hobbit] sending client side data
On 7/31/07, Jones, Jason (Altrincham) <JasonAS_Jones at mentor.com> wrote:
I've been looking through the archives at the sample server side
module
Henrick showed us (http://www.hswn.dk/hobbiton/2007/01/msg00487.html) I'm just curious if anyone knows how to send data from the client side to the server the same way the standard hobbit client tests do, looking at bb I'm guessing either bb data or bb client, but when I run bb <hobbitIP> "client <hostname>.<os>" nothing happens and the man pages don't mention how to actually send the data etc.
I have a Hobbit client install running a BigIP check. In client/etc/clientlaunch.cfg:
[bigip-v4]
ENVFILE $HOBBITCLIENTHOME/etc/hobbitclient.cfg
CMD $HOBBITCLIENTHOME/ext/bigip/bigip3.sh
LOGFILE $HOBBITCLIENTHOME/logs/hobbitclient.log
INTERVAL 5m
After doing what it needs to do to get the status, the script sends off a status message to the server like this:
MACHINE=`echo $NAME | sed -e 's/\./,/g'`
MESSAGE="status $MACHINE.$TEST $COLOR `date`<P><font size=+2>The
$BIGIP BigIP says: $NAME $TEST is $STATE</font>" $BB $BBDISP $MESSAGE
Is that what you're looking for??
Ralph Mitchell
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Tue, Jul 24, 2007 at 10:15:02PM -0400, Scott Walters wrote:
On 7/24/07, Henrik Stoerner <henrik at hswn.dk> wrote:
I've whipped up a very rough implementation as part of the eventlog report on the Hobbit demo site. Could you try generating a report at http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh and let me know if the data at the top is something in the right direction?
Bingo. But since that was so easy, here are a few more: [snip] We'll keep requesting features until it gets hard ;)
Reports are usually rather boring things to do, but this one was fun. Have a look at the current state of this report at the Hobbit demo site http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.sh (You can also just go to the demo site, and pick the "Reports" -> "Top Changes" report).
Should cover everything you've asked for - at least until now, that is.
Regards, Henrik
Well, that's damned handy...
=G=
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: Wednesday, July 25, 2007 10:05 AM To: hobbit at hswn.dk Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Tue, Jul 24, 2007 at 10:15:02PM -0400, Scott Walters wrote:
On 7/24/07, Henrik Stoerner <henrik at hswn.dk> wrote:
I've whipped up a very rough implementation as part of the eventlog report on the Hobbit demo site. Could you try generating a report at http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh and let me know if the data at the top is something in the right direction?
Bingo. But since that was so easy, here are a few more: [snip] We'll keep requesting features until it gets hard ;)
Reports are usually rather boring things to do, but this one was fun. Have a look at the current state of this report at the Hobbit demo site http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.sh (You can also just go to the demo site, and pick the "Reports" -> "Top Changes" report).
Should cover everything you've asked for - at least until now, that is.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Reports are usually rather boring things to do, but this one was fun. Have a look at the current state of this report at the Hobbit demo site http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.sh (You can also just go to the demo site, and pick the "Reports" -> "Top Changes" report).
Should cover everything you've asked for - at least until now, that is.
Looks really great! Can we have next to the numbers also the percentage related to all event changes in the defined timeframe?
Johann
Reports are usually rather boring things to do, but this one was fun. Have a look at the current state of this report at the Hobbit demo site http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.sh (You can also just go to the demo site, and pick the "Reports" -> "Top Changes" report).
Should cover everything you've asked for - at least until now, that is.
Looks really great! Can we have next to the numbers also the percentage related to all event changes in the defined timeframe?
Wonderful.
That's some kind of information you can show your manger(s) and you know where you have to probably investigate.
Thanks Johann
On Wed, Jul 25, 2007 at 04:12:06PM +0200, Johann Eggers wrote:
Reports are usually rather boring things to do, but this one was fun. Have a look at the current state of this report at the Hobbit demo site http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.sh
Looks really great! Can we have next to the numbers also the percentage related to all event changes in the defined timeframe?
Sure, already done.
I also added something I felt was missing: When you have the top-10 list showing that host "foo" has the most status changes, then when you click on that host I wanted an overview of what services put it in the top-10. So I added a summary by service when you click on a host in the top-10 display.
And likewise when you click on a service in the top-10 list, it gives you a list of the hosts that were counted for that service.
Regards, Henrik
On 7/25/07, Henrik Stoerner <henrik at hswn.dk> wrote:
I also added something I felt was missing: When you have the top-10 list showing that host "foo" has the most status changes, then when you click on that host I wanted an overview of what services put it in the top-10. So I added a summary by service when you click on a host in the top-10 display.
Wow. That is awesome. Great idea. Someone needs to talk to the mail admin of SMTP for www.sslug.dk!
Could you also add the report period to the server and services "sub-reports"?
Could you add state changes by day of week and hour of day?
Mon 12 Tue 145 Wed 351
And since this all been so easy, how about trending reports based on an interval?
For example, by week of year show total state changes for a specified server or service. E.G.
server1 server2
Week 1 10 12 Week 2 134 23
I'll try and think of a clever way to use RRD for this kind of data. I'd imagine we could structure the RRAs to avoid averaging, and force timestamps to match the interval.
Scott Walters -PacketPusher
On Wed, Jul 25, 2007 at 01:11:47PM -0400, Scott Walters wrote:
On 7/25/07, Henrik Stoerner <henrik at hswn.dk> wrote:
I also added something I felt was missing: When you have the top-10 list showing that host "foo" has the most status changes, then when you click on that host I wanted an overview of what services put it in the top-10. So I added a summary by service when you click on a host in the top-10 display.
Wow. That is awesome. Great idea. Someone needs to talk to the mail admin of SMTP for www.sslug.dk!
Could you also add the report period to the server and services "sub-reports"?
Done.
Could you add state changes by day of week and hour of day? And since this all been so easy, how about trending reports based on an interval?
Let's leave those for now - these will be more difficult to implement. The only other addition I'd like to make for this report now is to have it count the event durations instead of the number of changes, so you can have a top-10 report of the hosts (or services) that have the longest outages. Could be useful when playing the "blame game". "Look - the DB people are always soooo slow when it comes to cleaning up the filled tables".
Regards, Henrik
On 7/25/07, Henrik Stoerner <henrik at hswn.dk> wrote:
On Wed, Jul 25, 2007 at 01:11:47PM -0400, Scott Walters wrote:
On 7/25/07, Henrik Stoerner <henrik at hswn.dk> wrote:
I also added something I felt was missing: When you have the top-10 list showing that host "foo" has the most status changes, then when you click on that host I wanted an overview of what services put it in the top-10. So I added a summary by service when you click on a host in the top-10 display.
Just a couple of minor observations on the top-10 list:
the right hand box, "Top 10 Services" has a "Host" column. Probably should be "Service"??
I was lazy and just put in the date, with no time of day, and scored an "Internal Server Error". Could it default to "from 00:00:00" & "to 23:59:59", or maybe have a "last XX minutes OR from/to", same as the Notification Report??
Thanks,
Ralph Mitchell
On Thu, Jul 26, 2007 at 02:35:26PM -0500, Ralph Mitchell wrote:
Just a couple of minor observations on the top-10 list:
- the right hand box, "Top 10 Services" has a "Host" column. Probably should be "Service"??
Of course - fixed.
- I was lazy and just put in the date, with no time of day, and scored an "Internal Server Error". Could it default to "from 00:00:00" & "to 23:59:59", or maybe have a "last XX minutes OR from/to", same as the Notification Report??
I'll have to do some extra checking on that input. I've also added some buttons so you can easily select the last/current year/month/week.
Regards, Henrik
One feature I'd like to see is a more comprehensive editing page for the Critical Systems. Specifically, it'd be nice to see all of the currently defined groups. This would make it a little easier when adding new hosts to monitor in Hobbit, and ensure that they are added to the correct critical systems group (and to avoid duplicates and near-duplicates).
One thing that might be useful would be to alter the importance flag so if a critical system goes down it comes in with an exclamation mark etc. so it stands out from the other hobbit alerts....just a thought. Jason.
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: 25 July 2007 15:05 To: hobbit at hswn.dk Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Tue, Jul 24, 2007 at 10:15:02PM -0400, Scott Walters wrote:
On 7/24/07, Henrik Stoerner <henrik at hswn.dk> wrote:
I've whipped up a very rough implementation as part of the eventlog report on the Hobbit demo site. Could you try generating a report at http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh and let me know if the data at the top is something in the right direction?
Bingo. But since that was so easy, here are a few more: [snip] We'll keep requesting features until it gets hard ;)
Reports are usually rather boring things to do, but this one was fun. Have a look at the current state of this report at the Hobbit demo site http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.sh (You can also just go to the demo site, and pick the "Reports" -> "Top Changes" report).
Should cover everything you've asked for - at least until now, that is.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On 7/25/07, Henrik Stoerner <henrik at hswn.dk> wrote:
Reports are usually rather boring things to do, but this one was fun. Have a look at the current state of this report at the Hobbit demo site http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.sh (You can also just go to the demo site, and pick the "Reports" -> "Top Changes" report).
Should cover everything you've asked for - at least until now, that is.
Perfect. You rock Henrik.
Scott Walters -PacketPusher
On 7/24/07, Henrik Stoerner <henrik at hswn.dk> wrote:
On Mon, Jul 23, 2007 at 09:44:11PM -0400, Scott Walters wrote:
For a given input period (Last 7 days, June 2007, etc.)
Servers with the most state changes, sorted by highest to lowest (Maybe just Top 10). Clicking on server would generate list of state changes. "Look Bob, your server is not stable you need to get your developers under control!"
Services with the most state changes, sorted by highest to lowest (Maybe just Top 10). Clicking on service would generate list of the state changes for that period. "PHB, the web group is performing way too many undocumented code changes."
I've whipped up a very rough implementation as part of the eventlog report on the Hobbit demo site. Could you try generating a report at http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh and let me know if the data at the top is something in the right direction?
The nice thing about making it an add-on or variant of the eventlog report is that there's already all of the nice filtering for hosts, pages, time-periods etc in place, plus the "allevents" logfile parsing is also done.
Regards, Henrik
Henrik, I like this. This provides a lot of flexibility on reporting the 10 ten stats. I could see where bigger sites might want more than a top 10 listed. Maybe it could be 10 by default and have the option to list more.
John
On 7/21/07, Henrik Stoerner <henrik at hswn.dk> wrote:
In another thread, someone asked about what new features are planned for version 4.3.0. I've summarized them below; they have all been implemented by now. Some of them have been contributed by others over the past year - I'm pleased to have finally gotten their patches merged.
There are some open bug-reports, and the plan now is to try and get those fixed. Once that is done I'll ask you all to start testing the beta-versions, and then a new release is hopefully available soon.
This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.
Here is another feature I like to see.
A way for the hobbit server to request hobbit clent to run a command locally based on an alert.
(Pretty similar to sending a request to the client to download newer version from download dir)
May be a dir called ~hobbit/server/command (like ~hobbit/server/download)
In that command file define the command.
Then in the ~hobbit/server/etc/client-local.cfg file define a class and in the class have a attribute like clientcommand: command definition
And in the bb-hosts file msgs:command
So whenever there is a msgs alert run that command locally on the client
Major new features
- PAGE setting for alert- and client-configuration handles hosts on multiple pages, so any pagename can be used.
- Flap detection of statuses that change color rapidly. The status is kept at the most critical level until it stops flapping.
- Holiday support for alerts, including variable holidays (Easter etc)
- Split NCV support - graph data from NCV can be split into multiple RRD databases allowing for varying number of datasets.
- RRD database parameters are now configurable (i.e. number of datapoints stored, whether to store min/max values etc). Note that this only applies to newly created RRD files, not existing ones.
- Distributed worker modules allow sharing the load across multiple Hobbit servers
- RRD updates are now cached for up to 30 minutes before being written to disk. This makes the I/O load on large installations much lighter.
- Detection of statuses that are reported by multiple hosts
- Client backend-support for the z/OS and z/VSE clients by Rich Smirna
Display things
- Graph zooming now limits the lower/upper bounds of a graph (requires rrdtool 1.2.x)
- The trends page default data-period can be configured to something other than the default 48-hour view, and the user can select a different period on-the-fly.
- Hosts can be sorted automatically on the overview webpage with a "group-sorted" group definition.
- NOCOLUMNS setting in bb-hosts let you suppress certain columns on a per-host basis
- Host-comments are displayed as tool-tips, to save screen space.
Checks and graphs
- Network tests can use a specific source IP instead of the default
- The validity-period of network tests is configurable, instead of being fixed at the default 30-minute setting
- Client file checks can check for a symlink
- "trends" report for RRD handling allows generating custom-made RRD files
- Hobbit host- and status-counts are tracked in an RRD file
Miscellaneous
- NCV reports can handle color-icons before the name:value data
- hobbitlaunch tasks can be configured to run on certain hosts only
- Time-warp detection and warning
- Local unix-socket interface to Hobbit daemon
- hobbitd_capture can collect several statuses and hand off such a batch to an external command
- Support for SHA-224/256/384/512 digests
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
On Sun, Jul 22, 2007 at 08:01:12PM -0400, Asif Iqbal wrote:
Here is another feature I like to see.
A way for the hobbit server to request hobbit clent to run a command locally based on an alert. [snip] So whenever there is a msgs alert run that command locally on the client
Run this as a client extension:
#!/bin/sh
Get the current status of the "msgs" column
MSGSSTATUS=$BB $BBDISP "query $MACHINE.msgs" | awk '{ print $1 }
Get the command we must run from the client config
CMD=grep "^msgsrecovercmd:" $BBTMP/logfetch.$MACHINEDOTS.cfg | sed -e 's!^msgsrecovercmd:!!'
If "msgs" is red and there is a command, run it
if test "$MSGSSTATUS" = "red" -a "$CMD" != "" then $CMD fi
exit 0
Before doing this, consider the security implications of having your servers run commands that they fetch from a remote host without authentication.
Regards, Henrik
Wonder if there is any way to tell a client what it's status is so it can be autonomous? What I mean is this: suppose there was a way for the Hobbit client to tell the server that service X was now in state Y, and a client-side module could then activate response Z on its own?
I know the Hobbit model is to have the server own the configurations, but how do we solve the "trust" problem?
GLH
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: Tuesday, July 24, 2007 3:41 PM To: hobbit at hswn.dk Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Sun, Jul 22, 2007 at 08:01:12PM -0400, Asif Iqbal wrote:
Here is another feature I like to see.
A way for the hobbit server to request hobbit clent to run a command locally based on an alert. [snip] So whenever there is a msgs alert run that command locally on the client
Run this as a client extension:
#!/bin/sh
Get the current status of the "msgs" column
MSGSSTATUS=$BB $BBDISP "query $MACHINE.msgs" | awk '{ print $1 }
Get the command we must run from the client config
CMD=grep "^msgsrecovercmd:" $BBTMP/logfetch.$MACHINEDOTS.cfg | sed -e 's!^msgsrecovercmd:!!'
If "msgs" is red and there is a command, run it
if test "$MSGSSTATUS" = "red" -a "$CMD" != "" then $CMD fi
exit 0
Before doing this, consider the security implications of having your servers run commands that they fetch from a remote host without authentication.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Why dont you just use the SCRIPT feature of hobbit-alerts? You can setup ssh authentication between your hobbit server and hobbit clients. Then if a specific test goes red, its executes the script, which in turn ssh's to the remote server having the issue and executes the script there to resolve the issue or whatever you need it to do.
We did this with a legacy application we use to have, the app would stop listening on its ports and the only way to fix it was to respin the application. So hobbit would test the port and if it failed it would send a page and fire off a script to spin the app. After a while we got tired of the pages so we had it email a generic mailbox that someone checked once in a while and removed it paging us. Worked great, never had customers complaints on that specific app after that.
Trent
On Tue, 2007-07-24 at 15:55 -0500, Hubbard, Greg L wrote:
Wonder if there is any way to tell a client what it's status is so it can be autonomous? What I mean is this: suppose there was a way for the Hobbit client to tell the server that service X was now in state Y, and a client-side module could then activate response Z on its own?
I know the Hobbit model is to have the server own the configurations, but how do we solve the "trust" problem?
GLH
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: Tuesday, July 24, 2007 3:41 PM To: hobbit at hswn.dk Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Sun, Jul 22, 2007 at 08:01:12PM -0400, Asif Iqbal wrote:
Here is another feature I like to see.
A way for the hobbit server to request hobbit clent to run a command locally based on an alert. [snip] So whenever there is a msgs alert run that command locally on the client
Run this as a client extension:
#!/bin/sh
Get the current status of the "msgs" column
MSGSSTATUS=
$BB $BBDISP "query $MACHINE.msgs" | awk '{ print $1 }Get the command we must run from the client config
CMD=
grep "^msgsrecovercmd:" $BBTMP/logfetch.$MACHINEDOTS.cfg | sed -e 's!^msgsrecovercmd:!!'If "msgs" is red and there is a command, run it
if test "$MSGSSTATUS" = "red" -a "$CMD" != "" then $CMD fi
exit 0
Before doing this, consider the security implications of having your servers run commands that they fetch from a remote host without authentication.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Sun, 2007-07-22 at 00:08 +0200, Henrik Stoerner wrote:
This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.
Get hobbitfetch to not crash, hang, or spin the cpu at 100%. I don't have to use hobbitfetch for many hosts, but it is incredibly annoying for the few that I do that I have to kill -6 the hobbitfetch process 4-5 times a day in order to get any statuses.
-- Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX Austin Energy http://www.austinenergy.com
On Mon, Jul 23, 2007 at 06:14:14AM -0500, Daniel J McDonald wrote:
On Sun, 2007-07-22 at 00:08 +0200, Henrik Stoerner wrote:
This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.
Get hobbitfetch to not crash, hang, or spin the cpu at 100%.
I know, this one is definitely a "must-fix-before-4.3.0" bug.
Henrik
Henrik,
One think I would like to see is the ability to encrypt traffic between the client and the server. Of course this would mean some tweaking of code on the BBWin side of things too. That is one feature that I know the BB Pro client introduced a few years back that would be an excellent addition for those of us who monitor systems at client sites.
Thank you and keep up the good work.
Dave Gilmore
All the new features sound great. It also sounds like nearly everyone has additional features they would like to see...do you use any sort of tool for tracking feature requests?
P.S. I might as well throw in my own feature request ;-)
- Content check should correctly follow 302 (redirects). I currently have to use a custom-made script that uses curl in order to do content checks. In fact, I will include it in case anyone wants to use it:
#!/bin/bash
contchk.sh written by Charles Jones (blazer0x at gmail.com) 6/6/2007
This script is designed to perform a content check on a URL and report the
status to a Hobbit server.
This script was created because Hobbits built-in content check
functionality
does not follow 302 redirects.
The script parses out a "contchk" tag in the bb-hosts file. The proper
syntax is: contchk;URL;REFERRER;CHECKSTRING
Note that CHECKSTRING cannot contain spaces so you must use regular
expression metacharacters, so use something like string.with.spaces
BBHTAG=contchk # Name of the tag in bb-hosts COLUMN=cont # Column display name in Hobbit CURL=/usr/bin/curl # Location of curl binary CURLOPTS="--connect-timeout 30 -m 30 -s -L -b cookiejar" # Curl options
Note: using grep because bbhostgrep fails on long lines
grep $BBHTAG $BBHOME/etc/bb-hosts | while read L
do
set $L # To get one line of output from bbhostgrep
HOSTIP="$1"
MACHINEDOTS="$2"
MACHINE=echo $2 | $SED -e's/\./,/g'
CHECKURL=echo $4 | awk -F";" '{print $2}' # Parse out the
check URL
REFERRER=echo $4 | awk -F";" '{print $3}' # Parse out the
referrer string
if [ "" != "$REFERRER" ];
then
REFERRER="-e $REFERRER"
fi
CHECKSTRING=echo $4 | awk -F";" '{print $4}' # Parse out the
check string
$CURL $CURLOPTS $REFERRER $CHECKURL |grep -q "$CHECKSTRING"
status=$? # Save greps return status
if [ 0 -eq $status ]; then # grep returns 0 if it found something
COLOR=green
MSG="String <b>\"$CHECKSTRING\"</b> was found in <a
href=$CHECKURL>$CHECKURL</a>"
$BB $BBDISP "status $MACHINE.$COLUMN $COLOR date Content Check OK
${MSG}
"
else # grep didn't find anything
COLOR=red
MSG="String <b>\"$CHECKSTRING\"</b> was NOT FOUND in <a
href=$CHECKURL>$CHECKURL</a>"
$BB $BBDISP "status $MACHINE.$COLUMN $COLOR date Content Check
FAILED
${MSG}
"
fi
done
exit 0
On Saturday 21 July 2007 18:08, Henrik Stoerner wrote:
In another thread, someone asked about what new features are planned for version 4.3.0. I've summarized them below; they have all been implemented by now. Some of them have been contributed by others over the past year - I'm pleased to have finally gotten their patches merged.
There are some open bug-reports, and the plan now is to try and get those fixed. Once that is done I'll ask you all to start testing the beta-versions, and then a new release is hopefully available soon. .......
Just checking if the iconnames.patch will be included in 4.3.0. That is that patch that allowed &color-acked keyword to display the acknowledged icons in tests ?
Thank you for all of your work, ~Steve
On Mon, Jul 23, 2007 at 12:16:53PM -0400, s_aiello at comcast.net wrote:
Just checking if the iconnames.patch will be included in 4.3.0. That is that patch that allowed &color-acked keyword to display the acknowledged icons in tests ?
Those patches that have been posted over the past year have all been merged into the code, so yes - it's included.
Regards, Henrik
Henrik Stoerner wrote:
This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.
I'd like to see POWER5 CPU stats: http://www.docum.org/twiki/bin/view/Hobbit/AixPower5
-- -mike
On Mon, Jul 23, 2007 at 01:51:55PM -0700, Mike Arnold wrote:
Henrik Stoerner wrote:
This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.
I'd like to see POWER5 CPU stats: http://www.docum.org/twiki/bin/view/Hobbit/AixPower5
No problem, except I don't understand why the Wiki claims that it is necessary to remove the '.' from the numbers. It seems this is to convert the data to percentages, but that can be done in the graph definition:
[vmstat-pc]
TITLE Used Physical CPU
YAXIS pc (100 = 1 CPU)
DEF:pc=vmstat.rrd:cpu_pc:AVERAGE
CDEF:pcpercent=pc,100,*
LINE2:pcpercent#00CC00
And the "-b 1024" in the Wiki graph definition looks bogus.
I'm cc'ing Stef Coene who wrote the Wiki entry to see if he can shed some light on this.
Regards, Henrik
On Monday 23 July 2007, you wrote:
On Mon, Jul 23, 2007 at 01:51:55PM -0700, Mike Arnold wrote:
I'd like to see POWER5 CPU stats: http://www.docum.org/twiki/bin/view/Hobbit/AixPower5
No problem, except I don't understand why the Wiki claims that it is necessary to remove the '.' from the numbers. It seems this is to convert the data to percentages, but that can be done in the graph definition: The cpu_pc and cpu_ec has always a "." in it: kthr memory page faults cpu
r b avm fre re pi po fr sr cy in sy cs us sy id wa pc
ec
2 2 1341331 15949 0 8 7 85 49 0 583 34667 1878 38 5 55 2 0.46
46.0
pc has always 2 numbers after the "." and ec 1 (I hope this stays the same for next AIX releases). Rrd wants integers (I think), so you have to strip the "." from the numbers. And, indeed, the -b 1024 is a copy-and-paste error.
I have more AIX updates (iostat graphs), but I don't have the time to create patches, maybe at the end of this week. After that I'm 2 weeks on holiday.
Stef
On Tue, Jul 24, 2007 at 08:38:37AM +0200, Stef Coene wrote:
On Monday 23 July 2007, you wrote:
On Mon, Jul 23, 2007 at 01:51:55PM -0700, Mike Arnold wrote:
I'd like to see POWER5 CPU stats: http://www.docum.org/twiki/bin/view/Hobbit/AixPower5
No problem, except I don't understand why the Wiki claims that it is necessary to remove the '.' from the numbers. It seems this is to convert the data to percentages, but that can be done in the graph definition: The cpu_pc and cpu_ec has always a "." in it: kthr memory page faults cpu
r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec 2 2 1341331 15949 0 8 7 85 49 0 583 34667 1878 38 5 55 2 0.46 46.0
pc has always 2 numbers after the "." and ec 1 (I hope this stays the same for next AIX releases). Rrd wants integers (I think), so you have to strip the "." from the numbers.
No, RRD uses floating-point numbers everywhere. So I'll keep the numbers unmodified - then we won't have any problems if IBM does change the number of decimals they report.
Regards, Henrik
On Monday 23 July 2007, Henrik Stoerner wrote:
On Mon, Jul 23, 2007 at 01:51:55PM -0700, Mike Arnold wrote:
Henrik Stoerner wrote:
This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.
I'd like to see POWER5 CPU stats: http://www.docum.org/twiki/bin/view/Hobbit/AixPower5 Related to this post, I have a perl script that can manipulate rrds:
- adding RRA's (so you can add MAX and MIN)
- adding DS's for the extra vmstat number with AIX 5.3)
- changing the DS (so you can keep the data longer)
- migrate from OS to OS (with rrdtool dump and restore)
Let me know if you are interested. The script itself uses some custom perl library's so I can not publish them, but I can try to filter out the needed information and procedures.
Stef
One of the things I'd like to see in 4.3.0 is what is already partly available on Sun-boxes in 4.2.0, the [iostatdisk] part. The stats are gathered but not plotted. I hope this monitor will make it, on all platforms, because you can pinpoint any disk performance problems much easier.
Thanks, Peter
2007/7/22, Henrik Stoerner <henrik at hswn.dk>:
[snip]
This doesn't mean that I won't consider adding new stuff before the
4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.
Peter Welter wrote:
One of the things I'd like to see in 4.3.0 is what is already partly available on Sun-boxes in 4.2.0, the [iostatdisk] part. The stats are gathered but not plotted. I hope this monitor will make it, on all platforms, because you can pinpoint any disk performance problems much easier.
Thanks, Peter
That would be good to have. I have been asked if hobbit does this from other groups in the company.
John
On 7/25/07, John Glowacki <johng at idttechnology.com> wrote:
Peter Welter wrote:
One of the things I'd like to see in 4.3.0 is what is already partly available on Sun-boxes in 4.2.0, the [iostatdisk] part. The stats are gathered but not plotted. I hope this monitor will make it, on all platforms, because you can pinpoint any disk performance problems much easier.
Thanks, Peter
That would be good to have. I have been asked if hobbit does this from other groups in the company.
Tracking disk IO gets complicated pretty quickly for a few reasons:
- OS's don't have common commands for measuring disk performance
- Do you watch IO by filesystem or spindle? If you have RAID, grabbing the data can become even more difficult.
- People can disagree on what good disk IO means, and even fewer understand disk IO workloads.
- I have no idea if Windows and the WMI has this kind of info.
For *ix, the "blocked processes" of vmstat is an excellent way to see if the server overall is IO bound. I would definitely like to see that a "stock" displayed metric in 4.3.0. Most *ix vmstat provides that number. Similar to the iostat for Solaris, the info is collected, just not displayed.
Scott Walters -PacketPusher
On Wednesday 25 July 2007, Peter Welter wrote:
One of the things I'd like to see in 4.3.0 is what is already partly available on Sun-boxes in 4.2.0, the [iostatdisk] part. The stats are gathered but not plotted. I hope this monitor will make it, on all platforms, because you can pinpoint any disk performance problems much easier. I have this running for AIX with an external script. One of my todo's is making the rrd hobbitd module like the vmstat module so you can have a definition per type host. On the other hand, the iostat output is different then the vmstat output and the external script is working fine ....
Stef
participants (24)
-
charles.goyard@orange-ftgroup.com
-
dan.mcdonald@austinenergy.com
-
david@stenhouseconsulting.com
-
fduranti@q8.it
-
Galen.Johnson@sas.com
-
greg.hubbard@eds.com
-
gumby3203@gmail.com
-
henrik@hswn.dk
-
hobbit@razorsedge.org
-
JasonAS_Jones@mentor.com
-
jg2727@gmail.com
-
Johann.Eggers@teleatlas.com
-
johng@idttechnology.com
-
jonescr@cisco.com
-
peter.welter@gmail.com
-
ralphmitchell@gmail.com
-
s_aiello@comcast.net
-
sabeer.mz@gmail.com
-
scott@PacketPushers.com
-
shea_greg@emc.com
-
stef.coene@docum.org
-
tj_yang@hotmail.com
-
trent.melcher@sitel.com
-
vadud3@gmail.com