Hello
some time ago I already talked about devmon stops working when a monitored device ist not responding. Now I saw it has nothing to do with non responsive devices. Devmon stops working at irregular intervals. I set Devmon to verbose and looked at the devmon log. I saw that there are simply no more messages when it stops working (see below). No error messages - nothing. None in the devmon log nor in the syslog.
If I do a "ps -ef" I see all devmon processes running:
[root at s068a300 devmon]# ps -ef |grep devmon hobbit 10211 1 0 Nov09 ? 00:10:07 devmon[master] hobbit 10214 10211 0 Nov09 ? 00:00:22 devmon hobbit 10215 10211 0 Nov09 ? 00:00:21 devmon hobbit 10217 10211 0 Nov09 ? 00:00:22 devmon hobbit 10218 10211 0 Nov09 ? 00:01:52 devmon hobbit 10219 10211 0 Nov09 ? 00:00:21 devmon hobbit 10220 10211 0 Nov09 ? 00:01:51 devmon hobbit 10221 10211 0 Nov09 ? 00:01:52 devmon hobbit 10222 10211 0 Nov09 ? 00:00:00 devmon hobbit 10223 10211 0 Nov09 ? 00:00:00 devmon root 20447 3611 0 14:47 pts/1 00:00:00 grep devmon
Any idea how I can find out why devmon stops working and what the processes do when they are stuck. If I send a SIGTERM to the devmon master process, it stops all other processe, so it looks it is responding to signals as it should.
BTW.: has anyone a devmon startup/shutdown script which works on SuSE EL.
Thorsten Erdmann
Attachement: Here are the last few lines of the devmon log
[09-11-10 at 10:52:21] Performing test logic [09-11-10 at 10:52:21] Done with test logic [09-11-10 at 10:52:21] Sending messages to display server [09-11-10 at 10:52:21] Done sending messages [09-11-10 at 10:52:21] Sleeping for 59 seconds. [09-11-10 at 10:53:20] Starting snmp queries [09-11-10 at 10:53:20] Getting device status from hobbit at localhost:1984 [09-11-10 at 10:53:20] Querying u068usv020a1 for tests battery,powerin,power,diag,temperature,msgs [09-11-10 at 10:53:20] Querying u068usv020a2 for tests battery,powerin,power,diag,temperature,msgs [09-11-10 at 10:53:20] Querying u068usv020b1 for tests battery,powerin,power,diag,temperature,msgs [09-11-10 at 10:53:20] Querying u068usv020b2 for tests battery,powerin,power,diag,temperature,msgs [09-11-10 at 10:53:20] Querying u068usv110111 for tests power,temperature [09-11-10 at 10:53:20] Querying u068usvnw1111 for tests power,temperature [09-11-10 at 10:53:20] Querying u068usvnw1112 for tests power,temperature [09-11-10 at 10:53:20] Querying u068usvnw1211 for tests power,temperature [09-11-10 at 10:53:21] Performing test logic [09-11-10 at 10:53:21] Done with test logic [09-11-10 at 10:53:21] Sending messages to display server [09-11-10 at 10:53:21] Done sending messages [09-11-10 at 10:53:21] Sleeping for 59 seconds. [09-11-10 at 10:54:20] Starting snmp queries [09-11-10 at 10:54:20] Getting device status from hobbit at localhost:1984 [09-11-10 at 10:54:20] Querying u068usv020a1 for tests battery,powerin,power,diag,temperature,msgs [09-11-10 at 10:54:21] Querying u068usv020a2 for tests battery,powerin,power,diag,temperature,msgs [09-11-10 at 10:54:21] Querying u068usv020b1 for tests battery,powerin,power,diag,temperature,msgs [09-11-10 at 10:54:21] Querying u068usv020b2 for tests battery,powerin,power,diag,temperature,msgs [09-11-10 at 10:54:21] Querying u068usv110111 for tests power,temperature [09-11-10 at 10:54:21] Querying u068usvnw1111 for tests power,temperature [09-11-10 at 10:54:21] Querying u068usvnw1112 for tests power,temperature [09-11-10 at 10:54:21] Querying u068usvnw1211 for tests power,temperature
If you are not the intended addressee, please inform us immediately that you have received this e-mail in error, and delete it. We thank you for your cooperation.