Hobbit: 4.2.0 with allinone patch. OS: Fedora Core 5, Dell Optiplex, dual core with 4G of memory
MAXMSG_STATUS=2048 # maximum size of a "status" message in kB, default: 256 MAXMSG_CLIENT=4096 # maximum size of a "client" message in kB, default: 512 MAXMSG_DATA=2048 # maximum size of a "data" message in kB, default: 256
[hobbit at hobbit2 etc]$ ipcs -lm
After shutting down the hobbit server:
[hobbit at hobbit2 ~]$ ipcs
------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x0901727f 5668872 hobbit 600 131072 0
------ Semaphore Arrays -------- key semid owner perms nsems 0x0901727f 7176201 hobbit 600 3
------ Message Queues -------- key msqid owner perms used-bytes messages
[hobbit at hobbit2 ~]$ ipcrm -s 7176201 [hobbit at hobbit2 ~]$ ipcrm -m 5668872
The snapshot does not suffer from the left-over shared memory/segments issue, but unfortunately with the description or comments being displayed as part of the host name on bb2.html it is unusable.
Here is my second issue:
Page.log:
2007-08-31 01:29:22 Stale alert for host08:ts-delay dropped 2007-08-31 01:29:22 Stale alert for hostf:cpu dropped 2007-08-31 01:29:22 Stale alert for host2:ocs dropped 2007-08-31 01:29:22 Stale alert for host06:procs dropped 2007-08-31 01:29:22 Stale alert for host08:tl1am-mir dropped 2007-08-31 01:29:22 Stale alert for host12:procs dropped 2007-08-31 01:29:22 Stale alert for host15:tl1am-osi dropped 2007-08-31 01:29:22 Stale alert for host16:tl1am-osi dropped 2007-08-31 01:29:22 Stale alert for host6:procs dropped 2007-08-31 01:29:22 Stale alert for host2:memory dropped 2007-08-31 01:29:22 Stale alert for host3:memory dropped 2007-08-31 01:29:22 Stale alert for ns:sins-out dropped 2007-08-31 01:29:22 Stale alert for hostxx:procsLSE dropped 2007-08-31 01:24:10 hobbitd_alert: Got message 49083, expected 49082 2007-08-31 01:24:17 hobbitd_alert: Got message 49085, expected 49084 2007-08-31 01:25:05 hobbitd_alert: Got message 49097, expected 49090 2007-08-31 01:25:22 hobbitd_alert: Got message 49104, expected 49103 2007-08-31 01:25:42 hobbitd_alert: Got message 49118, expected 49117 2007-08-31 01:26:17 hobbitd_alert: Got message 49129, expected 49126 2007-08-31 01:26:17 hobbitd_alert: Got message 49132, expected 49130 2007-08-31 01:27:04 Dropping (more) garbled data . . . 2007-08-31 09:43:18 hobbitd_alert: Got message 6950, expected 6946 Done 2007-08-31 09:43:40 hobbitd_alert: Got message 6965, expected 6955 stty: : Invalid argument stty: : Invalid argument 2007-08-31 10:38:21 hobbitd_alert: Got message 7693, expected 7692 stty: : Invalid argument stty: : Invalid argument stty: : Invalid argument stty: : Invalid argument stty: : Invalid argument stty: : Invalid argument . . . 2007-08-31 12:34:49 Stale alert for host3:ocs dropped 2007-08-31 12:34:50 Stale alert for host4:ocs dropped 2007-08-31 12:34:50 Stale alert for host5:ocs dropped 2007-08-31 12:34:50 Stale alert for host09:procs dropped 2007-08-31 12:34:50 Stale alert for host10:procs dropped 2007-08-31 12:34:50 Stale alert for host14:tl1am-osi dropped 2007-08-31 12:34:50 Stale alert for host15:tl1am-mir dropped 2007-08-31 12:34:50 Stale alert for host103:se dropped 2007-08-31 12:34:50 Stale alert for host17a:procs dropped
I think some of this may be caused by a server side external script, that ssh's to a remote host, runs a restart script, and emails to management, developers and support that processes have been restarted. Some of the ssh'ing is captured in the page.log file. Regardless, what is the deal with these stale alerts? I restarted hobbit to dump them 11 hours ago, and had to restart hobbit again. Unfortunately, these stale alerts are causing unnecessary page-outs.
Any hints of debugging would be appreciated. I suspect the restart script, but it is needed to keep 19 hosts healthy until they are patched next week, so I cannot stop using it. Regardless that is not related to the shared memory and semaphores because that was happening before the restart script was installed.
I would even try to use our second hobbit server instance but it suffers from 'Whoops ! bb failed to send message - timeout' every couple hours with the same configuration, also resolved if I could use the snapshot.
------ Shared Memory Limits -------- max number of segments = 4096 max seg size (kbytes) = 768000 max total shared memory (kbytes) = 8388608 min seg size (bytes) = 1
David Gore (v965-3670) Network Management Systems (NMS) IMPACT Transport Team Lead - SCSA, SCNA Page: 1-800-PAG-eMCI pin 1406090 Vnet: 965-3676