hobbitd_alert crashes
Hi,
This is snapshot of 01 june running on Solaris 9.
[bb at iris tmp]$ gdb ../bin/hobbitd_alert core GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.9"... Core was generated by `hobbitd_alert --checkpoint-file=/soft/pub/BB/hobbit/server/tmp/alert.chk --chec'. Program terminated with signal 6, Aborted. Reading symbols from /usr/local/lib/libpcre.so.0...done. Loaded symbols for /usr/local/lib/libpcre.so.0 Reading symbols from /usr/lib/libresolv.so.2...done. Loaded symbols for /usr/lib/libresolv.so.2 Reading symbols from /usr/lib/libsocket.so.1...done. Loaded symbols for /usr/lib/libsocket.so.1 Reading symbols from /usr/lib/libnsl.so.1...done. Loaded symbols for /usr/lib/libnsl.so.1 Reading symbols from /usr/lib/libc.so.1...done. Loaded symbols for /usr/lib/libc.so.1 Reading symbols from /usr/lib/libdl.so.1...done. Loaded symbols for /usr/lib/libdl.so.1 Reading symbols from /usr/lib/libmp.so.2...done. Loaded symbols for /usr/lib/libmp.so.2 Reading symbols from /usr/platform/SUNW,Sun-Fire-480R/lib/libc_psr.so.1...done. Loaded symbols for /usr/platform/SUNW,Sun-Fire-480R/lib/libc_psr.so.1 #0 0xff1a05c8 in _libc_kill () from /usr/lib/libc.so.1 (gdb)
Dominique UNIL - University of Lausanne
Could you do the "bt" command also, please ... ?
Henrik
On Fri, Jun 02, 2006 at 07:38:25AM +0200, Dominique Frise wrote:
Hi,
This is snapshot of 01 june running on Solaris 9.
[bb at iris tmp]$ gdb ../bin/hobbitd_alert core GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.9"... Core was generated by `hobbitd_alert --checkpoint-file=/soft/pub/BB/hobbit/server/tmp/alert.chk --chec'. Program terminated with signal 6, Aborted. Reading symbols from /usr/local/lib/libpcre.so.0...done. Loaded symbols for /usr/local/lib/libpcre.so.0 Reading symbols from /usr/lib/libresolv.so.2...done. Loaded symbols for /usr/lib/libresolv.so.2 Reading symbols from /usr/lib/libsocket.so.1...done. Loaded symbols for /usr/lib/libsocket.so.1 Reading symbols from /usr/lib/libnsl.so.1...done. Loaded symbols for /usr/lib/libnsl.so.1 Reading symbols from /usr/lib/libc.so.1...done. Loaded symbols for /usr/lib/libc.so.1 Reading symbols from /usr/lib/libdl.so.1...done. Loaded symbols for /usr/lib/libdl.so.1 Reading symbols from /usr/lib/libmp.so.2...done. Loaded symbols for /usr/lib/libmp.so.2 Reading symbols from /usr/platform/SUNW,Sun-Fire-480R/lib/libc_psr.so.1...done. Loaded symbols for /usr/platform/SUNW,Sun-Fire-480R/lib/libc_psr.so.1 #0 0xff1a05c8 in _libc_kill () from /usr/lib/libc.so.1 (gdb)
Henrik Stoerner wrote:
Could you do the "bt" command also, please ... ?
Henrik
On Fri, Jun 02, 2006 at 07:38:25AM +0200, Dominique Frise wrote:
Hi,
This is snapshot of 01 june running on Solaris 9.
[bb at iris tmp]$ gdb ../bin/hobbitd_alert core GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.9"... Core was generated by `hobbitd_alert --checkpoint-file=/soft/pub/BB/hobbit/server/tmp/alert.chk --chec'. Program terminated with signal 6, Aborted. Reading symbols from /usr/local/lib/libpcre.so.0...done. Loaded symbols for /usr/local/lib/libpcre.so.0 Reading symbols from /usr/lib/libresolv.so.2...done. Loaded symbols for /usr/lib/libresolv.so.2 Reading symbols from /usr/lib/libsocket.so.1...done. Loaded symbols for /usr/lib/libsocket.so.1 Reading symbols from /usr/lib/libnsl.so.1...done. Loaded symbols for /usr/lib/libnsl.so.1 Reading symbols from /usr/lib/libc.so.1...done. Loaded symbols for /usr/lib/libc.so.1 Reading symbols from /usr/lib/libdl.so.1...done. Loaded symbols for /usr/lib/libdl.so.1 Reading symbols from /usr/lib/libmp.so.2...done. Loaded symbols for /usr/lib/libmp.so.2 Reading symbols from /usr/platform/SUNW,Sun-Fire-480R/lib/libc_psr.so.1...done. Loaded symbols for /usr/platform/SUNW,Sun-Fire-480R/lib/libc_psr.so.1 #0 0xff1a05c8 in _libc_kill () from /usr/lib/libc.so.1 (gdb)
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
... (gdb) bt #0 0xff1a05c8 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff136d58 in abort () from /usr/lib/libc.so.1 #2 0x0002134c in sigsegv_handler (signum=0) at sig.c:57 #3 <signal handler called> (gdb)
Dominique UNIL - University of Lausanne
On Fri, Jun 02, 2006 at 07:43:13AM +0200, Dominique Frise wrote:
(gdb) bt #0 0xff1a05c8 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff136d58 in abort () from /usr/lib/libc.so.1 #2 0x0002134c in sigsegv_handler (signum=0) at sig.c:57 #3 <signal handler called>
Hrm, that isn't much to go on. Does it crash right away when you start Hobbit, or only after some time has passed ?
If it crashes right away, I'd like a copy of your bb-hosts, hobbitserver.cfg and hobbit-alerts.cfg files. If it crashes after some time, could you add the "--debug" option to the hobbitd_alert command in hobbitlaunch.cfg, and then mail me the ~hobbit/server/logs/page.log file after it has crashed?
Regards, Henrik
Henrik Stoerner wrote:
On Fri, Jun 02, 2006 at 07:43:13AM +0200, Dominique Frise wrote:
(gdb) bt #0 0xff1a05c8 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff136d58 in abort () from /usr/lib/libc.so.1 #2 0x0002134c in sigsegv_handler (signum=0) at sig.c:57 #3 <signal handler called>
Hrm, that isn't much to go on. Does it crash right away when you start Hobbit, or only after some time has passed ?
It crashed 3 times last night. Hobbit was last restarted yesterday at 05:10 PM
If it crashes right away, I'd like a copy of your bb-hosts, hobbitserver.cfg and hobbit-alerts.cfg files. If it crashes after some time, could you add the "--debug" option to the hobbitd_alert command in hobbitlaunch.cfg, and then mail me the ~hobbit/server/logs/page.log file after it has crashed?
Done. I'll mail you the log asap.
Thank you.
Dominique UNIL - University of Lausanne
Dominique Frise wrote:
Henrik Stoerner wrote:
On Fri, Jun 02, 2006 at 07:43:13AM +0200, Dominique Frise wrote:
(gdb) bt #0 0xff1a05c8 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff136d58 in abort () from /usr/lib/libc.so.1 #2 0x0002134c in sigsegv_handler (signum=0) at sig.c:57 #3 <signal handler called>
Hrm, that isn't much to go on. Does it crash right away when you start Hobbit, or only after some time has passed ?
It crashed 3 times last night. Hobbit was last restarted yesterday at 05:10 PM
If it crashes right away, I'd like a copy of your bb-hosts, hobbitserver.cfg and hobbit-alerts.cfg files. If it crashes after some time, could you add the "--debug" option to the hobbitd_alert command in hobbitlaunch.cfg, and then mail me the ~hobbit/server/logs/page.log file after it has crashed?
Done. I'll mail you the log asap.
Thank you.
Dominique UNIL - University of Lausanne
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Looking at the event log, I noticed that the 3 times that hobbitd_alert crashed, it was trying to send to an IGNORE recipient (not always the same). Here are our IGNORE rules after macros definitions at top of hobbit-alerts.cfg. Maybe there is something wrong with this configuration?
... ... #---------------------------------------
Hosts groups
$SAP_HOSTS=quartz,topaze,onyx,its,tulp,zircon $ADMIN_HOSTS=bilbo,falco,furio
#------------------------------------------------------------------------------
Rules to exclude alerting during a period of time
#------------------------------------------------------------------------------
HOST=* SERVICE=bckp TIME=*:2000:0700 IGNORE HOST=uldns1,uldns2 SERVICE=ldap TIME=*:0500:0530 IGNORE HOST=kawa,kawa2 SERVICE=http TIME=*:2210:2235 IGNORE HOST=gaia SERVICE=http TIME=*:0012:0015 IGNORE HOST=balrog,godzilla,smaug SERVICE=cpu TIME=*:2000:2359 IGNORE HOST=acsls,balrog,godzilla,smaug SERVICE=memory TIME=*:0600:0800 IGNORE HOST=unimedia,unimediad SERVICE=orcl,http TIME=*:0655:1115 IGNORE HOST=virtuavd SERVICE=orcl TIME=*:0001:0400 IGNORE HOST=tstvirtua SERVICE=orcl TIME=*:2159:0200 IGNORE HOST=ged SERVICE=http TIME=*:0305:0315 IGNORE HOST=$SAP_HOSTS SERVICE=conn,cpu,http,ftp TIME=*:1900:0700 IGNORE HOST=$SAP_HOSTS SERVICE=orcl,procs TIME=*:1900:2359 IGNORE HOST=$ADMIN_HOSTS SERVICE=http,sslcert TIME=*:0030:0630 IGNORE HOST=$ADMIN_HOSTS SERVICE=conn,cpu,http,sslcert TIME=*:1800:0700 IGNORE HOST=esope SERVICE=http,orcl TIME=*:0355:0600 IGNORE HOST=pcsan SERVICE=msgs,svcs,procs TIME=*:1955:2300 IGNORE HOST=iris SERVICE=hobbitd TIME=*:0310:0320 IGNORE HOST=lanfeust,winup TIME=*:1945:2200 IGNORE ... ...
Dominique UNIL - University of Lausanne
On Fri, Jun 02, 2006 at 08:19:10AM +0200, Dominique Frise wrote:
Looking at the event log, I noticed that the 3 times that hobbitd_alert crashed, it was trying to send to an IGNORE recipient (not always the same).
Thanks, it was easy to reproduce the problem once I tried some IGNORE rules. I believe this patch should solve the problem.
Regards, Henrik
Henrik Stoerner wrote:
On Fri, Jun 02, 2006 at 08:19:10AM +0200, Dominique Frise wrote:
Looking at the event log, I noticed that the 3 times that hobbitd_alert crashed, it was trying to send to an IGNORE recipient (not always the same).
Thanks, it was easy to reproduce the problem once I tried some IGNORE rules. I believe this patch should solve the problem.
Regards, Henrik
------------------------------------------------------------------------
--- hobbitd/do_alert.c 2006/05/28 15:16:51 1.91 +++ hobbitd/do_alert.c 2006/06/02 11:12:17 @@ -88,6 +88,8 @@ char *id, *method = "unknown"; repeat_t *walk;
+ if (recip->method == M_IGNORE) return NULL; + switch (recip->method) { case M_MAIL: method = "mail"; break; case M_SCRIPT: method = "script"; break; @@ -325,6 +327,8 @@ * might create here is NOT used later on. */ rpt = find_repeatinfo(alert, recip, 1); + if (!rpt) continue; /* Happens for e.g. M_IGNORE recipients */ + dprintf(" repeat %s at %d\n", rpt->recipid, rpt->nextalert); if (rpt->nextalert > now) { traceprintf("Recipient '%s' dropped, next alert due at %d > %d\n", --- lib/loadalerts.c 2006/05/31 08:50:03 1.13 +++ lib/loadalerts.c 2006/06/02 11:19:36 @@ -1092,7 +1092,9 @@ (recip->criteria && (recip->criteria->sendnotice == SR_WANTED)) ) notice = 1;
*codes = '\0'; - if (recip->method == M_IGNORE) strcat(codes, "I"); + if (recip->method == M_IGNORE) { + recip->recipient = "-- ignored --"; + } if (recip->noalerts) { if (strlen(codes)) strcat(codes, ",A"); else strcat(codes, "-A"); } if (recovered && !recip->noalerts) { if (strlen(codes)) strcat(codes, ",R"); else strcat(codes, "R"); } if (notice) { if (strlen(codes)) strcat(codes, ",N"); else strcat(codes, "N"); }
------------------------------------------------------------------------
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
We did not have any new crash since we applied the patches :-) Thank you. Dominique UNIL - University of Lausanne
participants (2)
-
Dominique.Frise@unil.ch
-
henrik@hswn.dk