Hi All,
I'm just in the process of converting our old big brother monitor to xymon. I had a look at zabbix, nagios, groundworks, and pandorafms and found too many quirks, cruft and gotchas (plus the problem of open core systems). I returned to the bb clones and found xymon is not only active and maintained. I must say Xymon is a well thought out, well executed step beyond big brother, well done, Henrik.
Is there a way to have a hierarchy for alerts? For example, say we have a branch office with a print server, file server, router, switch, etc. I have a xymon entry for each (and multiples for each, ie poll the ftp port on the file server as well as just the conn monitoring). If the WAN link to the site dies, I get alerts for all the above, where as I should really just get one alert for the router. Or if the file server dies, I shouldn't get both ftp and file server alerts, just the server.
I know this wasn't available for big brother and can't find anything in xymon. Does this exsit and/or has this been considered?
For our big brother system, I created a perl script daemon that reads in the allevents log to create a "stateful" table of all monitored items. I also have a text file table of the relationships between the monitored items, using tabs:
core_switch wan_router site_router site_switch printer server ftp bbd_host local_server smtp user_subnet_switch printer
This table is read into a hashed array, the allevents log file is tailed to keep the state table current and the script awaits queries. When an alertable event comes in, this daemon is queried by the alert subsystem (using another perl script), the daemon checks to see if any parent items are in a red state, if so the alert is discarded.
This is effectively just a plugin and could be more efficiently done if integrated into the bbd. In terms of config files, having to maintain two files is not ideal. Currently the host listing and web layout are effectively combined in one file, so adding a hierarchy as above would be tricky. Perhaps a second "#" with the parent following, ie:
1.2.3.4 hostname.domain.com # smtp dns #
core_switch.domain.com
Anyway, this is just an initial query about this issue.
thanks and regards, Phil
You can use the DEPENDS option in you hosts.cfg file(s) to indicate that one network test depends on another :-
depends=(testA:host1/test1,host2/test2),(testB:host3/test3),[...] This tag allows you to define dependencies between tests. If "testA" for the current host depends on "test1" for host "host1" and test "test2" for "host2", this can be defined with
depends=(testA:host1/test1,host2/test2)
When deciding the color to report for testA, if either host1/test1 failed or host2/test2 failed, if testA has failed also then the color of testA will be "clear" instead of red or yellow.
Since all tests are actually run before the dependencies are evaluated, you can use any host/test in the dependency - regardless of the actual sequence that the hosts are listed, or the tests run. It is also valid to use tests from the same host that the dependency is for. E.g.
1.2.3.4 foo # http://foo/ webmin depends=(webmin:foo/http)
is valid; if both the http and the webmin tests fail, then webmin will be reported as clear.
Note: The "depends" tag is evaluated by xymonnet while running the network tests. It can therefore only refer to other network tests that are handled by the same server - there is currently no way to use the e.g. the status of locally run tests (disk, cpu, msgs) or network tests from other servers in a dependency definition. Such dependencies are silently ignored.
From the hosts.cfg man page.
-----Original Message----- From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Phil Crooker Sent: 30 March 2011 00:57 To: xymon at xymon.com Subject: [Xymon] Hierarchial alerting
Hi All,
I'm just in the process of converting our old big brother monitor to xymon. I had a look at zabbix, nagios, groundworks, and pandorafms and found too many quirks, cruft and gotchas (plus the problem of open core systems). I returned to the bb clones and found xymon is not only active and maintained. I must say Xymon is a well thought out, well executed step beyond big brother, well done, Henrik.
Is there a way to have a hierarchy for alerts? For example, say we have a branch office with a print server, file server, router, switch, etc. I have a xymon entry for each (and multiples for each, ie poll the ftp port on the file server as well as just the conn monitoring). If the WAN link to the site dies, I get alerts for all the above, where as I should really just get one alert for the router. Or if the file server dies, I shouldn't get both ftp and file server alerts, just the server.
I know this wasn't available for big brother and can't find anything in xymon. Does this exsit and/or has this been considered?
For our big brother system, I created a perl script daemon that reads in the allevents log to create a "stateful" table of all monitored items. I also have a text file table of the relationships between the monitored items, using tabs:
core_switch wan_router site_router site_switch printer server ftp bbd_host local_server smtp user_subnet_switch printer
This table is read into a hashed array, the allevents log file is tailed to keep the state table current and the script awaits queries. When an alertable event comes in, this daemon is queried by the alert subsystem (using another perl script), the daemon checks to see if any parent items are in a red state, if so the alert is discarded.
This is effectively just a plugin and could be more efficiently done if integrated into the bbd. In terms of config files, having to maintain two files is not ideal. Currently the host listing and web layout are effectively combined in one file, so adding a hierarchy as above would be tricky. Perhaps a second "#" with the parent following, ie:
1.2.3.4 hostname.domain.com # smtp dns #
core_switch.domain.com
Anyway, this is just an initial query about this issue.
thanks and regards, Phil
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited. If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither the RWE Group of Companies nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any).
Phil, In addition to the 'depends' tag as suggested by Chris Morris, you may also consider the 'route' tag. I find it simpler to use for wan sites, though the behavior is somewhat different. Whereas depends will make dependent tests go clear, route makes conn tests go yellow with a message that it is down because device ABC is down. Something like:
1.2.3.4 wan_router # conn 1.2.3.5 site_router # conn route:wan_router 1.2.3.6 site_switch # conn route:wan_router,site_router 1.2.3.7 print_server # conn ftp route:wan_router,site_router,site_switch
So if wan_router goes down the other three will go yellow and therefore not alert. And when a tech looks at the yellow status it will plainly state it is down because wan_router is down.
Cheers.
-----Original Message----- From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Phil Crooker Sent: Tuesday, March 29, 2011 6:57 PM To: xymon at xymon.com Subject: [Xymon] Hierarchial alerting
Hi All,
I'm just in the process of converting our old big brother monitor to xymon. I had a look at zabbix, nagios, groundworks, and pandorafms and found too many quirks, cruft and gotchas (plus the problem of open core systems). I returned to the bb clones and found xymon is not only active and maintained. I must say Xymon is a well thought out, well executed step beyond big brother, well done, Henrik.
Is there a way to have a hierarchy for alerts? For example, say we have a branch office with a print server, file server, router, switch, etc. I have a xymon entry for each (and multiples for each, ie poll the ftp port on the file server as well as just the conn monitoring). If the WAN link to the site dies, I get alerts for all the above, where as I should really just get one alert for the router. Or if the file server dies, I shouldn't get both ftp and file server alerts, just the server.
I know this wasn't available for big brother and can't find anything in xymon. Does this exsit and/or has this been considered?
For our big brother system, I created a perl script daemon that reads in the allevents log to create a "stateful" table of all monitored items. I also have a text file table of the relationships between the monitored items, using tabs:
core_switch wan_router site_router site_switch printer server ftp bbd_host local_server smtp user_subnet_switch printer
This table is read into a hashed array, the allevents log file is tailed to keep the state table current and the script awaits queries. When an alertable event comes in, this daemon is queried by the alert subsystem (using another perl script), the daemon checks to see if any parent items are in a red state, if so the alert is discarded.
This is effectively just a plugin and could be more efficiently done if integrated into the bbd. In terms of config files, having to maintain two files is not ideal. Currently the host listing and web layout are effectively combined in one file, so adding a hierarchy as above would be tricky. Perhaps a second "#" with the parent following, ie:
1.2.3.4 hostname.domain.com # smtp dns #
core_switch.domain.com
Anyway, this is just an initial query about this issue.
thanks and regards, Phil
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
Thanks so much for this suggestion, and thanks Chris for your pointing out what I should have looked into before posting....
This is great, choices!
cheers, Phil
On 3/31/2011 at 2:33 AM, in message <85108204A570E341A517A744DA59C41E05E71D9543 at EXITS712.its.iastate.edu>, "Dugan, Darin D [EIT]" <dddugan at iastate.edu> wrote: Phil, In addition to the 'depends' tag as suggested by Chris Morris, you may also consider the 'route' tag. I find it simpler to use for wan sites, though the behavior is somewhat different. Whereas depends will make dependent tests go clear, route makes conn tests go yellow with a message that it is down because device ABC is down. Something like:
1.2.3.4 wan_router # conn 1.2.3.5 site_router # conn route:wan_router 1.2.3.6 site_switch # conn route:wan_router,site_router 1.2.3.7 print_server # conn ftp route:wan_router,site_router,site_switch
So if wan_router goes down the other three will go yellow and therefore not alert. And when a tech looks at the yellow status it will plainly state it is down because wan_router is down.
Cheers.
participants (3)
-
Chris.Morris@rwe.com
-
dddugan@iastate.edu
-
Phil.Crooker@orix.com.au