'red' status not showing up in history
I have a test that's gone 'red' multiple times today. I know this because I've seen it in the website and alerts have been sent to me. Yet if I look in ~hobbit/data/hist, ~hobbit/data/histlogs, or use the hsitory CGIs in the web GUI, it shows that it hasn't been 'red' today, rather that it's been green for >2 days. This isn't the case though, as It's gone 'red' 6 times today alone (2 are repeated e-mails, 2h delay).
from alert e-mails: bspdm07.edc.cingular.net:sgsn_procs red [55828] red Thu Mar 6 09:43:03 2008 bspdm07.edc.cingular.net:sgsn_procs red [271768] red Thu Mar 6 07:47:06 2008 bspdm07.edc.cingular.net:sgsn_procs red [271768] red Thu Mar 6 05:51:35 2008 bspdm07.edc.cingular.net:sgsn_procs red [811858] red Thu Mar 6 05:11:22 2008 bspdm07.edc.cingular.net:sgsn_procs red [811858] red Thu Mar 6 03:15:51 2008 bspdm07.edc.cingular.net:sgsn_procs red [694717] red Thu Mar 6 02:10:28 2008 bspdm07.edc.cingular.net:sgsn_procs red [738302] red Thu Mar 6 01:45:16 2008 bspdm07.edc.cingular.net:sgsn_procs red [648888] red Thu Mar 6 01:20:08 2008
yet from the logs: ~hobbit/data/hist:# tail bspdm07,edc,cingular,net.sgsn_procs Tue Feb 26 11:01:29 2008 red 1204052489 12385 Tue Feb 26 14:27:54 2008 green 1204064874 1506 Tue Feb 26 14:53:00 2008 red 1204066380 301 Tue Feb 26 14:58:01 2008 green 1204066681 549212 Mon Mar 3 23:31:33 2008 red 1204615893 1521 Mon Mar 3 23:56:54 2008 green 1204617414 592 Tue Mar 4 00:06:46 2008 red 1204618006 302 Tue Mar 4 00:11:48 2008 green 1204618308 302 Tue Mar 4 00:16:50 2008 red 1204618610 1206 Tue Mar 4 00:36:56 2008 green 1204619816
~hobbit/data/histlogs/bspdm07_edc_cingular_net/sgsn_procs:# ls -rt | tail Tue_Feb_26_11:01:29_2008 Tue_Feb_26_14:27:54_2008 Tue_Feb_26_14:53:00_2008 Tue_Feb_26_14:58:01_2008 Mon_Mar_3_23:31:33_2008 Mon_Mar_3_23:56:54_2008 Tue_Mar_4_00:06:46_2008 Tue_Mar_4_00:11:48_2008 Tue_Mar_4_00:16:50_2008 Tue_Mar_4_00:36:56_2008
So where did my reds go? ...
Also, it's not just this test, there are >10,000 tests doing this. Other tests seem to reflect red status fine, others don't. I see no rhyme or reason to it. :-(
stephen
On Thu, Mar 06, 2008 at 10:56:16AM -0800, Menton, Stephen wrote:
I have a test that's gone 'red' multiple times today. I know this because I've seen it in the website and alerts have been sent to me. Yet if I look in ~hobbit/data/hist, ~hobbit/data/histlogs, or use the hsitory CGIs in the web GUI, it shows that it hasn't been 'red' today, rather that it's been green for >2 days.
The only explanation I can give is that the hobbitd_history module might have been stopped or crashed when this red status happened. Could you check the history.log and hobbitlaunch.log files to see if there's any mention of this ?
Doing a full restart of Hobbit will force a sync of the history logs with the current status recorded in Hobbit, but it obviously cannot record events that are long past.
Regards, Henrik
Well, the server was restarted back on the 4th... Current procs: hobbit 12332 12230 0 Mar 04 ? 5:26 hobbitd_channel --channel=page --log=/var/log/hobbit/page.log hobbitd_alert --c hobbit 12231 12230 2 Mar 04 ? 379:51 hobbitd --pidfile=/var/log/hobbit/hobbitd.pid --restart=/opt/home/hobbit/server hobbit 12326 12319 0 Mar 04 ? 40:04 hobbitd_rrd --rrddir=/opt/home/hobbit/data/rrd hobbit 12328 12321 0 Mar 04 ? 6:02 hobbitd_client hobbit 12333 12330 2 Mar 04 ? 117:51 hobbitd_history hobbit 12321 12230 0 Mar 04 ? 1:44 hobbitd_channel --channel=client --log=/var/log/hobbit/clientdata.log hobbitd_c hobbit 12331 12230 0 Mar 04 ? 0:15 hobbitd_channel --channel=clichg --log=/var/log/hobbit/hostdata.log hobbitd_hos hobbit 12334 12331 0 Mar 04 ? 0:25 hobbitd_hostdata hobbit 12327 12320 0 Mar 04 ? 4:43 hobbitd_rrd --rrddir=/opt/home/hobbit/data/rrd hobbit 12330 12230 0 Mar 04 ? 0:04 hobbitd_channel --channel=stachg --log=/var/log/hobbit/history.log hobbitd_hist hobbit 12320 12230 0 Mar 04 ? 0:12 hobbitd_channel --channel=data --log=/var/log/hobbit/rrd-data.log hobbitd_rrd - hobbit 12319 12230 0 Mar 04 ? 21:28 hobbitd_channel --channel=status --log=/var/log/hobbit/rrd-status.log hobbitd_r hobbit 12335 12332 1 Mar 04 ? 153:22 hobbitd_alert --checkpoint-file=/opt/home/hobbit/server/tmp/alert.chk --checkpo
hobbitd_history seems to be running...
Again, status of other tests are being reflected properly, history recorded properly, both for the same test on other hosts AND for different tests on the same host. It's like random histories are being ignored... x,x
history.log shows some errors... But they are later than confirmed times when some histories weren't being recorded (~2 days ago).
hobbitlaunch.log shows a termination of hobbitd 2 days ago... But this was the time of a restart so perhaps it's expected. Again, it has been writing valid histories for some items since then. I'll try forcibly shutting down the server and brining it back up as well.
stephen
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: Thursday, March 06, 2008 1:43 PM To: hobbit at hswn.dk Subject: Re: [hobbit] 'red' status not showing up in history
On Thu, Mar 06, 2008 at 10:56:16AM -0800, Menton, Stephen wrote:
I have a test that's gone 'red' multiple times today. I know this because I've seen it in the website and alerts have been sent to me. Yet if I look in ~hobbit/data/hist, ~hobbit/data/histlogs, or use the hsitory CGIs in the web GUI, it shows that it hasn't been 'red' today,
rather that it's been green for >2 days.
The only explanation I can give is that the hobbitd_history module might have been stopped or crashed when this red status happened. Could you check the history.log and hobbitlaunch.log files to see if there's any mention of this ?
Doing a full restart of Hobbit will force a sync of the history logs with the current status recorded in Hobbit, but it obviously cannot record events that are long past.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
participants (2)
-
henrik@hswn.dk
-
SM9614@att.com