False Process Down Alerts

newer
[Patch] Add Content-Type headers...

older
Buffer client data until fetched /...

chris.naude.0＠gmail.com

16 Jan 2010 16 Jan '10

3:59 a.m.

I'm run into a strange problem with my Xymon server. I noticed today that I'm receiving random false alerts for processes being down. When I look at the process list output in the alert it looks as if the data coming from the clients isn't correct. Here is an example. Has anyone seen anything like this?

9613 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 10389 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 9794 1 oracle 10:55:57 S 154 0.00 00:00:0 217600]oracleTEST (LOCAL=NO) 1592 1 oracle Jan 11 S 154 0.00 00:00:11 217136 ora_mman_TEST 12751 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 8965 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c

11819 1 oracle Jan 12 S 154 0.00 00:00:07 217280 ora_j015_TEST 2711 1 roo ]ec 4 S 120 0.04 00:02:16 868 /usr/sbin/xntpd 3547 1 xymon Dec 4 S 168 0.00 00:00:43 268 /opt/xymon/client/bin/hobbitlaunch --config=/opt/xymon/client/etc/clientlaunch.cfg --log=/opt/xymon/client/logs/clientlaunch.log --pidfile=/opt/xymon/client/logs/clientlaunch.101.example.com.pid 3728 1 root Dec 4 R 152 0.00 00:00:37 4208 /usr/sbin/stm/uut/bin/tools/monitor/WbemWrapperMonitor

Xymon version: 4.3.0-0.beta2 Xymon server: CentOS 5.4 32 bit

Client: HP-UX 11.31 Itanium

-- Chris Naude

Show replies by date

lars.ebeling＠leopg9.no-ip.org

16 Jan 16 Jan

2:56 p.m.

New subject: [hobbit] False Process Down Alerts

It looks like two instances of the client are writing to the file at the same time or almost ;)

Lars ----- Original Message ----- From: Chris Naude To: hobbit at hswn.dk Sent: Saturday, January 16, 2010 4:59 AM Subject: [hobbit] False Process Down Alerts

Xymon version: 4.3.0-0.beta2 Xymon server: CentOS 5.4 32 bit

Client: HP-UX 11.31 Itanium

-- Chris Naude

chris.naude.0＠gmail.com

5:44 p.m.

New subject: [hobbit] False Process Down Alerts

That makes a lot of sense. I did have some issues with the startup scripts on HP-UX. I'll check it out later tonight. Hopefully i can get it fixed before it goes live tonight. Thanks!

On Sat, Jan 16, 2010 at 7:56 AM, Lars Ebeling <lars.ebeling at leopg9.no-ip.org

...

wrote:

...

It looks like two instances of the client are writing to the file at the same time or almost ;)

Lars

----- Original Message ----- *From:* Chris Naude <chris.naude.0 at gmail.com> *To:* hobbit at hswn.dk *Sent:* Saturday, January 16, 2010 4:59 AM *Subject:* [hobbit] False Process Down Alerts

I'm run into a strange problem with my Xymon server. I noticed today that I'm receiving random false alerts for processes being down. When I look at the process list output in the alert it looks as if the data coming from the clients isn't correct. Here is an example. Has anyone seen anything like this?

9613 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 10389 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 9794 1 oracle 10:55:57 S 154 0.00 00:00:0 217600]oracleTEST (LOCAL=NO) 1592 1 oracle Jan 11 S 154 0.00 00:00:11 217136 ora_mman_TEST 12751 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 8965 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c

11819 1 oracle Jan 12 S 154 0.00 00:00:07 217280 ora_j015_TEST 2711 1 roo ]ec 4 S 120 0.04 00:02:16 868 /usr/sbin/xntpd 3547 1 xymon Dec 4 S 168 0.00 00:00:43 268 /opt/xymon/client/bin/hobbitlaunch --config=/opt/xymon/client/etc/clientlaunch.cfg --log=/opt/xymon/client/logs/clientlaunch.log --pidfile=/opt/xymon/client/logs/clientlaunch.101.example.com.pid 3728 1 root Dec 4 R 152 0.00 00:00:37 4208 /usr/sbin/stm/uut/bin/tools/monitor/WbemWrapperMonitor

Xymon version: 4.3.0-0.beta2 Xymon server: CentOS 5.4 32 bit

Client: HP-UX 11.31 Itanium

-- Chris Naude

-- Chris Naude

chris.naude.0＠gmail.com

17 Jan 17 Jan

11:11 p.m.

New subject: [hobbit] False Process Down Alerts

The problem has suddenly become much much worse. I verified with tcpdump that the data coming from the client is 100% correct. It seems something on the Xymon server side is not handling the client data correctly. Anyone have any other ideas?

[image: red] 89% /testdb3 (37771472% used) has reached the PANIC level (95%)

Filesystem 1024-blocks Used Available Capacity Mounted on /dev/vgtestdb1/lvol1 107844344 70901816 36942528 66% /testdb1 /dev/vgtestdb2/lvol1 35962064 25453128 10508936 71% /testdb2 /dev/vgtestdb4/lvol1 970909400 825006344 145903056 85% /testdb4 /dev/vgtestdb3/lv l1 ] 338788224 301016752 37771472 89% /testdb3 /dev/vgtestdb5/lvol1 179789048 150553912 29235136 84% /testdb5 /dev/vg00/lvol8 24580711 74501 24506210 1% /home /dev/vg00/lvol4 10226680 6339283 3887397 62% /opt

On Sat, Jan 16, 2010 at 10:44 AM, Chris Naude <chris.naude.0 at gmail.com>wrote:

...

That makes a lot of sense. I did have some issues with the startup scripts on HP-UX. I'll check it out later tonight. Hopefully i can get it fixed before it goes live tonight. Thanks!

On Sat, Jan 16, 2010 at 7:56 AM, Lars Ebeling < lars.ebeling at leopg9.no-ip.org> wrote:

...
It looks like two instances of the client are writing to the file at the same time or almost ;)

Lars

----- Original Message ----- *From:* Chris Naude <chris.naude.0 at gmail.com> *To:* hobbit at hswn.dk *Sent:* Saturday, January 16, 2010 4:59 AM *Subject:* [hobbit] False Process Down Alerts

I'm run into a strange problem with my Xymon server. I noticed today that I'm receiving random false alerts for processes being down. When I look at the process list output in the alert it looks as if the data coming from the clients isn't correct. Here is an example. Has anyone seen anything like this?

9613 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 10389 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 9794 1 oracle 10:55:57 S 154 0.00 00:00:0 217600]oracleTEST (LOCAL=NO) 1592 1 oracle Jan 11 S 154 0.00 00:00:11 217136 ora_mman_TEST 12751 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 8965 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c

11819 1 oracle Jan 12 S 154 0.00 00:00:07 217280 ora_j015_TEST 2711 1 roo ]ec 4 S 120 0.04 00:02:16 868 /usr/sbin/xntpd 3547 1 xymon Dec 4 S 168 0.00 00:00:43 268 /opt/xymon/client/bin/hobbitlaunch --config=/opt/xymon/client/etc/clientlaunch.cfg --log=/opt/xymon/client/logs/clientlaunch.log --pidfile=/opt/xymon/client/logs/clientlaunch.101.example.com.pid 3728 1 root Dec 4 R 152 0.00 00:00:37 4208 /usr/sbin/stm/uut/bin/tools/monitor/WbemWrapperMonitor

Xymon version: 4.3.0-0.beta2 Xymon server: CentOS 5.4 32 bit

Client: HP-UX 11.31 Itanium

-- Chris Naude

-- Chris Naude

-- Chris Naude

josh＠imaginenetworksllc.com

11:21 p.m.

New subject: [hobbit] False Process Down Alerts

Is there only one client sending data as this name? I don't think you answered Lars' email.

What does the alert read and what does the data say? Missing process? Too high of a load?

Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373

"The secret to creativity is knowing how to hide your sources." --- Albert Einstein

On Sun, Jan 17, 2010 at 6:11 PM, Chris Naude <chris.naude.0 at gmail.com>wrote:

...

The problem has suddenly become much much worse. I verified with tcpdump that the data coming from the client is 100% correct. It seems something on the Xymon server side is not handling the client data correctly. Anyone have any other ideas?

[image: red] 89% /testdb3 (37771472% used) has reached the PANIC level (95%)

Filesystem 1024-blocks Used Available Capacity Mounted on /dev/vgtestdb1/lvol1 107844344 70901816 36942528 66% /testdb1 /dev/vgtestdb2/lvol1 35962064 25453128 10508936 71% /testdb2 /dev/vgtestdb4/lvol1 970909400 825006344 145903056 85% /testdb4 /dev/vgtestdb3/lv l1 ] 338788224 301016752 37771472 89% /testdb3 /dev/vgtestdb5/lvol1 179789048 150553912 29235136 84% /testdb5 /dev/vg00/lvol8 24580711 74501 24506210 1% /home /dev/vg00/lvol4 10226680 6339283 3887397 62% /opt

On Sat, Jan 16, 2010 at 10:44 AM, Chris Naude <chris.naude.0 at gmail.com>wrote:

...
That makes a lot of sense. I did have some issues with the startup scripts on HP-UX. I'll check it out later tonight. Hopefully i can get it fixed before it goes live tonight. Thanks!

On Sat, Jan 16, 2010 at 7:56 AM, Lars Ebeling < lars.ebeling at leopg9.no-ip.org> wrote:

...
It looks like two instances of the client are writing to the file at the same time or almost ;)

Lars

----- Original Message ----- *From:* Chris Naude <chris.naude.0 at gmail.com> *To:* hobbit at hswn.dk *Sent:* Saturday, January 16, 2010 4:59 AM *Subject:* [hobbit] False Process Down Alerts

I'm run into a strange problem with my Xymon server. I noticed today that I'm receiving random false alerts for processes being down. When I look at the process list output in the alert it looks as if the data coming from the clients isn't correct. Here is an example. Has anyone seen anything like this?

9613 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 10389 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 9794 1 oracle 10:55:57 S 154 0.00 00:00:0 217600]oracleTEST (LOCAL=NO) 1592 1 oracle Jan 11 S 154 0.00 00:00:11 217136 ora_mman_TEST 12751 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 8965 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c

11819 1 oracle Jan 12 S 154 0.00 00:00:07 217280 ora_j015_TEST 2711 1 roo ]ec 4 S 120 0.04 00:02:16 868 /usr/sbin/xntpd 3547 1 xymon Dec 4 S 168 0.00 00:00:43 268 /opt/xymon/client/bin/hobbitlaunch --config=/opt/xymon/client/etc/clientlaunch.cfg --log=/opt/xymon/client/logs/clientlaunch.log --pidfile=/opt/xymon/client/logs/clientlaunch.101.example.com.pid 3728 1 root Dec 4 R 152 0.00 00:00:37 4208 /usr/sbin/stm/uut/bin/tools/monitor/WbemWrapperMonitor

Xymon version: 4.3.0-0.beta2 Xymon server: CentOS 5.4 32 bit

Client: HP-UX 11.31 Itanium

-- Chris Naude

-- Chris Naude

-- Chris Naude

chris.naude.0＠gmail.com

18 Jan 18 Jan

12:08 a.m.

New subject: [hobbit] False Process Down Alerts

I have 7 clients running. Each client has a different name. They are all sending data to the primary Xymon server. The alerts are reading missing processes, full file systems, and msgs errors. Here is another sample of an unusual error. You can see the process list has a funky break in it.

Sun Jan 17 15:40:18 MST 2010 - Processes NOT ok

[image: yellow] Expected string COMMAND not found in ps output header

PID PPID USER STIM] S PRI %CPU TIME VSZ COMMAND 0 0 root Dec 14 S 127 0.16 00:40:00 0 swapper 1 0 root Dec 14 R 152 0.09 00:01:21 2064 init 48 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 45 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 42 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 31 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 30 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 29 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 28 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 26 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 5 0 root Dec 14 R 152 0.00 00:00:02 0 signald 6 0 root Dec 14 R 152 0.00 00:00:03 0 kmemdaemon 17 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 16 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 15 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 14 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 13 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 12 0 root Dec 14 S 152 0.00 00:00:00 0 usbhubd 11 0 root Dec 14 R 152 0.00 00:01:11 0 escsid 10 0 root Dec 14 S -32 0.00 00:00:00 0 ttisr 9 0 root Dec 14 R 152 0.00 00:01:27 0 ksyncer_daemon

7 0]root Dec 14 R 152 0.00 00:]0:00 0 kai_daemon 50 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 47 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 44 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 41 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached

On Sun, Jan 17, 2010 at 4:21 PM, Josh Luthman <josh at imaginenetworksllc.com>wrote:

...

Is there only one client sending data as this name? I don't think you answered Lars' email.

What does the alert read and what does the data say? Missing process? Too high of a load?

Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373

"The secret to creativity is knowing how to hide your sources." --- Albert Einstein

On Sun, Jan 17, 2010 at 6:11 PM, Chris Naude <chris.naude.0 at gmail.com>wrote:

...
The problem has suddenly become much much worse. I verified with tcpdump that the data coming from the client is 100% correct. It seems something on the Xymon server side is not handling the client data correctly. Anyone have any other ideas?

[image: red] 89% /testdb3 (37771472% used) has reached the PANIC level (95%)

Filesystem 1024-blocks Used Available Capacity Mounted on /dev/vgtestdb1/lvol1 107844344 70901816 36942528 66% /testdb1 /dev/vgtestdb2/lvol1 35962064 25453128 10508936 71% /testdb2 /dev/vgtestdb4/lvol1 970909400 825006344 145903056 85% /testdb4 /dev/vgtestdb3/lv l1 ] 338788224 301016752 37771472 89% /testdb3 /dev/vgtestdb5/lvol1 179789048 150553912 29235136 84% /testdb5 /dev/vg00/lvol8 24580711 74501 24506210 1% /home /dev/vg00/lvol4 10226680 6339283 3887397 62% /opt

On Sat, Jan 16, 2010 at 10:44 AM, Chris Naude <chris.naude.0 at gmail.com>wrote:

...
That makes a lot of sense. I did have some issues with the startup scripts on HP-UX. I'll check it out later tonight. Hopefully i can get it fixed before it goes live tonight. Thanks!

On Sat, Jan 16, 2010 at 7:56 AM, Lars Ebeling < lars.ebeling at leopg9.no-ip.org> wrote:

...
It looks like two instances of the client are writing to the file at the same time or almost ;)

Lars

----- Original Message ----- *From:* Chris Naude <chris.naude.0 at gmail.com> *To:* hobbit at hswn.dk *Sent:* Saturday, January 16, 2010 4:59 AM *Subject:* [hobbit] False Process Down Alerts

I'm run into a strange problem with my Xymon server. I noticed today that I'm receiving random false alerts for processes being down. When I look at the process list output in the alert it looks as if the data coming from the clients isn't correct. Here is an example. Has anyone seen anything like this?

9613 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 10389 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 9794 1 oracle 10:55:57 S 154 0.00 00:00:0 217600]oracleTEST (LOCAL=NO) 1592 1 oracle Jan 11 S 154 0.00 00:00:11 217136 ora_mman_TEST 12751 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 8965 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c

11819 1 oracle Jan 12 S 154 0.00 00:00:07 217280 ora_j015_TEST 2711 1 roo ]ec 4 S 120 0.04 00:02:16 868 /usr/sbin/xntpd 3547 1 xymon Dec 4 S 168 0.00 00:00:43 268 /opt/xymon/client/bin/hobbitlaunch --config=/opt/xymon/client/etc/clientlaunch.cfg --log=/opt/xymon/client/logs/clientlaunch.log --pidfile=/opt/xymon/client/logs/clientlaunch.101.example.com.pid 3728 1 root Dec 4 R 152 0.00 00:00:37 4208 /usr/sbin/stm/uut/bin/tools/monitor/WbemWrapperMonitor

Xymon version: 4.3.0-0.beta2 Xymon server: CentOS 5.4 32 bit

Client: HP-UX 11.31 Itanium

-- Chris Naude

-- Chris Naude

-- Chris Naude

-- Chris Naude

chris.naude.0＠gmail.com

7:20 p.m.

New subject: [hobbit] False Process Down Alerts

I've managed to stop the flood of false alerts. I removed all of my non-prod clients from the bb-hosts and shut off their client processes. The problem seems to be somehow related to the amount of data the Xymon server is trying to process.

On Sun, Jan 17, 2010 at 5:08 PM, Chris Naude <chris.naude.0 at gmail.com>wrote:

...

I have 7 clients running. Each client has a different name. They are all sending data to the primary Xymon server. The alerts are reading missing processes, full file systems, and msgs errors. Here is another sample of an unusual error. You can see the process list has a funky break in it.

Sun Jan 17 15:40:18 MST 2010 - Processes NOT ok

[image: yellow] Expected string COMMAND not found in ps output header

PID PPID USER STIM] S PRI %CPU TIME VSZ COMMAND 0 0 root Dec 14 S 127 0.16 00:40:00 0 swapper 1 0 root Dec 14 R 152 0.09 00:01:21 2064 init 48 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 45 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 42 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 31 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 30 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 29 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 28 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 26 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 5 0 root Dec 14 R 152 0.00 00:00:02 0 signald 6 0 root Dec 14 R 152 0.00 00:00:03 0 kmemdaemon 17 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 16 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 15 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 14 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 13 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 12 0 root Dec 14 S 152 0.00 00:00:00 0 usbhubd 11 0 root Dec 14 R 152 0.00 00:01:11 0 escsid 10 0 root Dec 14 S -32 0.00 00:00:00 0 ttisr 9 0 root Dec 14 R 152 0.00 00:01:27 0 ksyncer_daemon

7 0]root Dec 14 R 152 0.00 00:]0:00 0 kai_daemon 50 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 47 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 44 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 41 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached

On Sun, Jan 17, 2010 at 4:21 PM, Josh Luthman <josh at imaginenetworksllc.com

...
wrote:

...
Is there only one client sending data as this name? I don't think you answered Lars' email.

What does the alert read and what does the data say? Missing process? Too high of a load?

Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373

"The secret to creativity is knowing how to hide your sources." --- Albert Einstein

On Sun, Jan 17, 2010 at 6:11 PM, Chris Naude <chris.naude.0 at gmail.com>wrote:

...
The problem has suddenly become much much worse. I verified with tcpdump that the data coming from the client is 100% correct. It seems something on the Xymon server side is not handling the client data correctly. Anyone have any other ideas?

[image: red] 89% /testdb3 (37771472% used) has reached the PANIC level (95%)

Filesystem 1024-blocks Used Available Capacity Mounted on /dev/vgtestdb1/lvol1 107844344 70901816 36942528 66% /testdb1 /dev/vgtestdb2/lvol1 35962064 25453128 10508936 71% /testdb2 /dev/vgtestdb4/lvol1 970909400 825006344 145903056 85% /testdb4 /dev/vgtestdb3/lv l1 ] 338788224 301016752 37771472 89% /testdb3 /dev/vgtestdb5/lvol1 179789048 150553912 29235136 84% /testdb5 /dev/vg00/lvol8 24580711 74501 24506210 1% /home /dev/vg00/lvol4 10226680 6339283 3887397 62% /opt

On Sat, Jan 16, 2010 at 10:44 AM, Chris Naude <chris.naude.0 at gmail.com>wrote:

...
That makes a lot of sense. I did have some issues with the startup scripts on HP-UX. I'll check it out later tonight. Hopefully i can get it fixed before it goes live tonight. Thanks!

On Sat, Jan 16, 2010 at 7:56 AM, Lars Ebeling < lars.ebeling at leopg9.no-ip.org> wrote:

...
It looks like two instances of the client are writing to the file at the same time or almost ;)

Lars

----- Original Message ----- *From:* Chris Naude <chris.naude.0 at gmail.com> *To:* hobbit at hswn.dk *Sent:* Saturday, January 16, 2010 4:59 AM *Subject:* [hobbit] False Process Down Alerts

I'm run into a strange problem with my Xymon server. I noticed today that I'm receiving random false alerts for processes being down. When I look at the process list output in the alert it looks as if the data coming from the clients isn't correct. Here is an example. Has anyone seen anything like this?

9613 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 10389 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 9794 1 oracle 10:55:57 S 154 0.00 00:00:0 217600]oracleTEST (LOCAL=NO) 1592 1 oracle Jan 11 S 154 0.00 00:00:11 217136 ora_mman_TEST 12751 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 8965 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c

11819 1 oracle Jan 12 S 154 0.00 00:00:07 217280 ora_j015_TEST 2711 1 roo ]ec 4 S 120 0.04 00:02:16 868 /usr/sbin/xntpd 3547 1 xymon Dec 4 S 168 0.00 00:00:43 268 /opt/xymon/client/bin/hobbitlaunch --config=/opt/xymon/client/etc/clientlaunch.cfg --log=/opt/xymon/client/logs/clientlaunch.log --pidfile=/opt/xymon/client/logs/clientlaunch.101.example.com.pid 3728 1 root Dec 4 R 152 0.00 00:00:37 4208 /usr/sbin/stm/uut/bin/tools/monitor/WbemWrapperMonitor

Xymon version: 4.3.0-0.beta2 Xymon server: CentOS 5.4 32 bit

Client: HP-UX 11.31 Itanium

-- Chris Naude

-- Chris Naude

-- Chris Naude

-- Chris Naude

-- Chris Naude

Doug.Williams＠rhd.com

7:41 p.m.

New subject: [hobbit] False Process Down Alerts

Seems to me your clients data is being truncated. Try modifying this in your hobbitserver.cfg. You may want to set them appropriate size for your xymon server. I have xymon running on pretty beefy servers so I set these incredibly high, and even though they may exceed what xymon actually allows (but it is not hurting me). Restart hobbit server after making change to hobbitserver.cfg

MAXMSG_STATUS=30000000 MAXMSG_CLIENT=30000000 MAXMSG_DATA=30000000

-----Original Message----- From: Chris Naude [mailto:chris.naude.0 at gmail.com] Sent: Monday, January 18, 2010 2:21 PM To: hobbit at hswn.dk Subject: Re: [hobbit] False Process Down Alerts

On Sun, Jan 17, 2010 at 5:08 PM, Chris Naude <chris.naude.0 at gmail.com> wrote:

I have 7 clients running. Each client has a different name. They

are all sending data to the primary Xymon server. The alerts are reading missing processes, full file systems, and msgs errors. Here is another sample of an unusual error. You can see the process list has a funky break in it.

 Sun Jan 17 15:40:18 MST 2010 - Processes NOT ok

 yellow&lt;http://unixadmin.bestwestern.com/xymon/gifs/yellow.gif>

Expected string COMMAND not found in ps output header

  PID  PPID USER     
  STIM] S PRI  %CPU     TIME     VSZ COMMAND
    0     0 root      Dec 14  S 127  0.16 00:40:00       0

swapper 1 0 root Dec 14 R 152 0.09 00:01:21 2064 init 48 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 45 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 42 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 31 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 30 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 29 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 28 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 26 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 5 0 root Dec 14 R 152 0.00 00:00:02 0 signald 6 0 root Dec 14 R 152 0.00 00:00:03 0 kmemdaemon 17 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 16 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 15 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 14 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 13 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 12 0 root Dec 14 S 152 0.00 00:00:00 0 usbhubd 11 0 root Dec 14 R 152 0.00 00:01:11 0 escsid 10 0 root Dec 14 S -32 0.00 00:00:00 0 ttisr 9 0 root Dec 14 R 152 0.00 00:01:27 0 ksyncer_daemon 7 0]root Dec 14 R 152 0.00 00:]0:00 0 kai_daemon 50 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 47 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 44 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 41 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached

On Sun, Jan 17, 2010 at 4:21 PM, Josh Luthman

<josh at imaginenetworksllc.com> wrote:

	Is there only one client sending data as this name?  I

don't think you answered Lars' email. What does the alert read and what does the data say? Missing process? Too high of a load? Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373 "The secret to creativity is knowing how to hide your sources." --- Albert Einstein

	On Sun, Jan 17, 2010 at 6:11 PM, Chris Naude

<chris.naude.0 at gmail.com> wrote:

		The problem has suddenly become much much worse.

I verified with tcpdump that the data coming from the client is 100% correct. It seems something on the Xymon server side is not handling the client data correctly. Anyone have any other ideas?

		red 89%     /testdb3 (37771472% used) has

reached the PANIC level (95%) Filesystem 1024-blocks Used Available Capacity Mounted on /dev/vgtestdb1/lvol1 107844344 70901816 36942528 66% /testdb1 /dev/vgtestdb2/lvol1 35962064 25453128 10508936 71% /testdb2 /dev/vgtestdb4/lvol1 970909400 825006344 145903056 85% /testdb4 /dev/vgtestdb3/lv l1 ] 338788224 301016752 37771472 89% /testdb3 /dev/vgtestdb5/lvol1 179789048 150553912 29235136 84% /testdb5 /dev/vg00/lvol8 24580711 74501 24506210 1% /home /dev/vg00/lvol4 10226680 6339283 3887397 62% /opt

		On Sat, Jan 16, 2010 at 10:44 AM, Chris Naude

<chris.naude.0 at gmail.com> wrote:

			That makes a lot of sense. I did have

some issues with the startup scripts on HP-UX. I'll check it out later tonight. Hopefully i can get it fixed before it goes live tonight. Thanks!

			On Sat, Jan 16, 2010 at 7:56 AM, Lars

Ebeling <lars.ebeling at leopg9.no-ip.org> wrote:

				It looks like two instances of

the client are writing to the file at the same time or almost ;) Lars

					----- Original Message

					From: Chris Naude

<mailto:chris.naude.0 at gmail.com>
To: hobbit at hswn.dk Sent: Saturday, January 16, 2010 4:59 AM Subject: [hobbit] False Process Down Alerts

					I'm run into a strange

problem with my Xymon server. I noticed today that I'm receiving random false alerts for processes being down. When I look at the process list output in the alert it looks as if the data coming from the clients isn't correct. Here is an example. Has anyone seen anything like this?

					 9613  1944 root

Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 10389 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 9794 1 oracle 10:55:57 S 154 0.00 00:00:0 217600]oracleTEST (LOCAL=NO) 1592 1 oracle Jan 11 S 154 0.00 00:00:11 217136 ora_mman_TEST 12751 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 8965 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c

					11819     1 oracle

Jan 12 S 154 0.00 00:00:07 217280 ora_j015_TEST 2711 1 roo ]ec 4 S 120 0.04 00:02:16 868 /usr/sbin/xntpd 3547 1 xymon Dec 4 S 168 0.00 00:00:43 268 /opt/xymon/client/bin/hobbitlaunch --config=/opt/xymon/client/etc/clientlaunch.cfg --log=/opt/xymon/client/logs/clientlaunch.log --pidfile=/opt/xymon/client/logs/clientlaunch.101.example.com.pid 3728 1 root Dec 4 R 152 0.00 00:00:37 4208 /usr/sbin/stm/uut/bin/tools/monitor/WbemWrapperMonitor

					Xymon version:

4.3.0-0.beta2 Xymon server: CentOS 5.4 32 bit

					Client: HP-UX 11.31

Itanium

					-- 
					Chris Naude
					




			-- 
			Chris Naude
			




		-- 
		Chris Naude
		





-- 
Chris Naude

-- Chris Naude

chris.naude.0＠gmail.com

19 Jan 19 Jan

12:46 a.m.

New subject: [hobbit] False Process Down Alerts

I never received any alerts about messages being truncated. After disabling the non prod clients i started receiving alerts about the messages being truncated. I adjusted these values as specified below and they are good now. Tomorrow i'll enable the non prod servers again and see if this is what the original culprit was. Thanks!

On Mon, Jan 18, 2010 at 12:41 PM, Williams, Doug (Consultant-RIC) < Doug.Williams at rhd.com> wrote:

...

Seems to me your clients data is being truncated. Try modifying this in your hobbitserver.cfg. You may want to set them appropriate size for your xymon server. I have xymon running on pretty beefy servers so I set these incredibly high, and even though they may exceed what xymon actually allows (but it is not hurting me). Restart hobbit server after making change to hobbitserver.cfg

MAXMSG_STATUS=30000000 MAXMSG_CLIENT=30000000 MAXMSG_DATA=30000000

-----Original Message----- From: Chris Naude [mailto:chris.naude.0 at gmail.com] Sent: Monday, January 18, 2010 2:21 PM To: hobbit at hswn.dk Subject: Re: [hobbit] False Process Down Alerts

I've managed to stop the flood of false alerts. I removed all of my non-prod clients from the bb-hosts and shut off their client processes. The problem seems to be somehow related to the amount of data the Xymon server is trying to process.

On Sun, Jan 17, 2010 at 5:08 PM, Chris Naude <chris.naude.0 at gmail.com> wrote:
   I have 7 clients running. Each client has a different name. They
are all sending data to the primary Xymon server. The alerts are reading missing processes, full file systems, and msgs errors. Here is another sample of an unusual error. You can see the process list has a funky break in it.
    Sun Jan 17 15:40:18 MST 2010 - Processes NOT ok

     yellow&lt;http://unixadmin.bestwestern.com/xymon/gifs/yellow.gif>
Expected string COMMAND not found in ps output header
     PID  PPID USER
     STIM] S PRI  %CPU     TIME     VSZ COMMAND
       0     0 root      Dec 14  S 127  0.16 00:40:00       0
swapper 1 0 root Dec 14 R 152 0.09 00:01:21 2064 init 48 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 45 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 42 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 31 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 30 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 29 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 28 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 26 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 5 0 root Dec 14 R 152 0.00 00:00:02 0 signald 6 0 root Dec 14 R 152 0.00 00:00:03 0 kmemdaemon 17 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 16 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 15 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 14 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 13 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 12 0 root Dec 14 S 152 0.00 00:00:00 0 usbhubd 11 0 root Dec 14 R 152 0.00 00:01:11 0 escsid 10 0 root Dec 14 S -32 0.00 00:00:00 0 ttisr 9 0 root Dec 14 R 152 0.00 00:01:27 0 ksyncer_daemon
   7     0]root      Dec 14  R 152
    0.00 00:]0:00       0 kai_daemon
      50     0 root      Dec 14  S 152  0.00 00:00:00       0
net_str_cached 47 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 44 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 41 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached
   On Sun, Jan 17, 2010 at 4:21 PM, Josh Luthman
<josh at imaginenetworksllc.com> wrote:
           Is there only one client sending data as this name?  I
don't think you answered Lars' email.
           What does the alert read and what does the data say?
Missing process? Too high of a load?
           Josh Luthman
           Office: 937-552-2340
           Direct: 937-552-2343
           1100 Wayne St
           Suite 1337
           Troy, OH 45373

           "The secret to creativity is knowing how to hide your
sources." --- Albert Einstein
           On Sun, Jan 17, 2010 at 6:11 PM, Chris Naude
<chris.naude.0 at gmail.com> wrote:
                   The problem has suddenly become much much worse.
I verified with tcpdump that the data coming from the client is 100% correct. It seems something on the Xymon server side is not handling the client data correctly. Anyone have any other ideas?
                    red 89%     /testdb3 (37771472% used) has
reached the PANIC level (95%)
                   Filesystem            1024-blocks  Used
Available Capacity Mounted on /dev/vgtestdb1/lvol1 107844344 70901816 36942528 66% /testdb1 /dev/vgtestdb2/lvol1 35962064 25453128 10508936 71% /testdb2 /dev/vgtestdb4/lvol1 970909400 825006344 145903056 85% /testdb4 /dev/vgtestdb3/lv l1 ] 338788224 301016752 37771472 89% /testdb3 /dev/vgtestdb5/lvol1 179789048 150553912 29235136 84% /testdb5 /dev/vg00/lvol8 24580711 74501 24506210 1% /home /dev/vg00/lvol4 10226680 6339283 3887397 62% /opt
                   On Sat, Jan 16, 2010 at 10:44 AM, Chris Naude
<chris.naude.0 at gmail.com> wrote:
                           That makes a lot of sense. I did have
some issues with the startup scripts on HP-UX. I'll check it out later tonight. Hopefully i can get it fixed before it goes live tonight. Thanks!
                           On Sat, Jan 16, 2010 at 7:56 AM, Lars
Ebeling <lars.ebeling at leopg9.no-ip.org> wrote:
                                   It looks like two instances of
the client are writing to the file at the same time or almost ;)
                                   Lars

                                           ----- Original Message
                                           From: Chris Naude
<mailto:chris.naude.0 at gmail.com> To: hobbit at hswn.dk Sent: Saturday, January 16, 2010 4:59 AM Subject: [hobbit] False Process Down Alerts
                                           I'm run into a strange
problem with my Xymon server. I noticed today that I'm receiving random false alerts for processes being down. When I look at the process list output in the alert it looks as if the data coming from the clients isn't correct. Here is an example. Has anyone seen anything like this?
                                            9613  1944 root
Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 10389 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 9794 1 oracle 10:55:57 S 154 0.00 00:00:0 217600]oracleTEST (LOCAL=NO) 1592 1 oracle Jan 11 S 154 0.00 00:00:11 217136 ora_mman_TEST 12751 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 8965 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c
                                           11819     1 oracle
Jan 12 S 154 0.00 00:00:07 217280 ora_j015_TEST 2711 1 roo ]ec 4 S 120 0.04 00:02:16 868 /usr/sbin/xntpd 3547 1 xymon Dec 4 S 168 0.00 00:00:43 268 /opt/xymon/client/bin/hobbitlaunch --config=/opt/xymon/client/etc/clientlaunch.cfg --log=/opt/xymon/client/logs/clientlaunch.log --pidfile=/opt/xymon/client/logs/clientlaunch.101.example.com.pid 3728 1 root Dec 4 R 152 0.00 00:00:37 4208 /usr/sbin/stm/uut/bin/tools/monitor/WbemWrapperMonitor
                                           Xymon version:
4.3.0-0.beta2 Xymon server: CentOS 5.4 32 bit
                                           Client: HP-UX 11.31
Itanium
                                           --
                                           Chris Naude





                           --
                           Chris Naude





                   --
                   Chris Naude






   --
   Chris Naude
-- Chris Naude

To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk

-- Chris Naude

Tom.Stewart＠landsend.com

4:27 a.m.

New subject: [hobbit] False Process Down Alerts

I had this problem and then did the adjustment. Since then, I get a 5 minute hole in load average and a couple of other trends, even though in the solaris systems I have no problem using the multi-cpu and zone process without any problems. Most of the time when the hole shows up, I will get other missing 5 minute stats exactly one hour after the first one and then does it two or three times. I have tried to disable the caching, but it did not make a difference. The 4.3.0-2 beta seems to be very broken and no one knows why. Right now, I trying to determine if I am better off with another product, since issues do not seem to be a priority with anyone.

Tom

From: Chris Naude [mailto:chris.naude.0 at gmail.com] Sent: Monday, January 18, 2010 6:47 PM To: hobbit at hswn.dk Subject: Re: [hobbit] False Process Down Alerts

On Mon, Jan 18, 2010 at 12:41 PM, Williams, Doug (Consultant-RIC) <Doug.Williams at rhd.com> wrote:

MAXMSG_STATUS=30000000 MAXMSG_CLIENT=30000000 MAXMSG_DATA=30000000

-----Original Message----- From: Chris Naude [mailto:chris.naude.0 at gmail.com] Sent: Monday, January 18, 2010 2:21 PM To: hobbit at hswn.dk Subject: Re: [hobbit] False Process Down Alerts

On Sun, Jan 17, 2010 at 5:08 PM, Chris Naude <chris.naude.0 at gmail.com> wrote:

   I have 7 clients running. Each client has a different name. They

    Sun Jan 17 15:40:18 MST 2010 - Processes NOT ok

    yellow&lt;http://unixadmin.bestwestern.com/xymon/gifs/yellow.gif>

Expected string COMMAND not found in ps output header

     PID  PPID USER
     STIM] S PRI  %CPU     TIME     VSZ COMMAND
       0     0 root      Dec 14  S 127  0.16 00:40:00       0

   7     0]root      Dec 14  R 152
    0.00 00:]0:00       0 kai_daemon
      50     0 root      Dec 14  S 152  0.00 00:00:00       0

net_str_cached 47 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 44 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 41 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached

   On Sun, Jan 17, 2010 at 4:21 PM, Josh Luthman

<josh at imaginenetworksllc.com> wrote:

           Is there only one client sending data as this name?  I

don't think you answered Lars' email.

           What does the alert read and what does the data say?

Missing process? Too high of a load?

           Josh Luthman
           Office: 937-552-2340
           Direct: 937-552-2343
           1100 Wayne St
           Suite 1337
           Troy, OH 45373

           "The secret to creativity is knowing how to hide your

sources." --- Albert Einstein

           On Sun, Jan 17, 2010 at 6:11 PM, Chris Naude

<chris.naude.0 at gmail.com> wrote:

                   The problem has suddenly become much much worse.

I verified with tcpdump that the data coming from the client is 100% correct. It seems something on the Xymon server side is not handling the client data correctly. Anyone have any other ideas?

                   red 89%     /testdb3 (37771472% used) has

reached the PANIC level (95%)

                   Filesystem            1024-blocks  Used

Available Capacity Mounted on /dev/vgtestdb1/lvol1 107844344 70901816 36942528 66% /testdb1 /dev/vgtestdb2/lvol1 35962064 25453128 10508936 71% /testdb2 /dev/vgtestdb4/lvol1 970909400 825006344 145903056 85% /testdb4 /dev/vgtestdb3/lv l1 ] 338788224 301016752 37771472 89% /testdb3 /dev/vgtestdb5/lvol1 179789048 150553912 29235136 84% /testdb5 /dev/vg00/lvol8 24580711 74501 24506210 1% /home /dev/vg00/lvol4 10226680 6339283 3887397 62% /opt

                   On Sat, Jan 16, 2010 at 10:44 AM, Chris Naude

<chris.naude.0 at gmail.com> wrote:

                           That makes a lot of sense. I did have

some issues with the startup scripts on HP-UX. I'll check it out later tonight. Hopefully i can get it fixed before it goes live tonight. Thanks!

                           On Sat, Jan 16, 2010 at 7:56 AM, Lars

Ebeling <lars.ebeling at leopg9.no-ip.org> wrote:

                                   It looks like two instances of

the client are writing to the file at the same time or almost ;)

                                   Lars

                                           ----- Original Message

                                           From: Chris Naude

<mailto:chris.naude.0 at gmail.com>

                                           To: hobbit at hswn.dk
                                           Sent: Saturday, January

16, 2010 4:59 AM Subject: [hobbit] False Process Down Alerts

                                           I'm run into a strange

                                            9613  1944 root

                                           11819     1 oracle

                                           Xymon version:

4.3.0-0.beta2 Xymon server: CentOS 5.4 32 bit

                                           Client: HP-UX 11.31

Itanium

                                           --
                                           Chris Naude





                           --
                           Chris Naude





                   --
                   Chris Naude






   --
   Chris Naude

-- Chris Naude

To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk

-- Chris Naude

odinn_asgaard＠yahoo.com

18 Jan 18 Jan

8:03 p.m.

New subject: [hobbit] False Process Down Alerts

My xymon server monitors over 1500 clients with no issues. When I see false alerts, it has always been a configuration on my part where I have 2 servers in my bb-host file using the same name on different IPs.

Jim Sloan

Just remember, today is the day you thought tomorrow was going to be yesterday.

From: Chris Naude <chris.naude.0 at gmail.com> To: hobbit at hswn.dk Sent: Mon, January 18, 2010 2:20:43 PM Subject: Re: [hobbit] False Process Down Alerts

On Sun, Jan 17, 2010 at 5:08 PM, Chris Naude <chris.naude.0 at gmail.com> wrote:

...

I have 7 clients running. Each client has a different name. They are all sending data to the primary Xymon server. The alerts are reading missing processes, full file systems, and msgs errors. Here is another sample of an unusual error. You can see the process list has a funky break in it.

...
Sun Jan 17 15:40:18 MST 2010 - Processes NOT ok Expected string COMMAND not found in ps output header

PID PPID USER
STIM] S PRI %CPU TIME VSZ COMMAND 0 0 root Dec 14 S 127 0.16 00:40:00 0 swapper 1 0 root Dec 14 R 152 0.09 00:01:21 2064 init 48 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 45 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 42 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 31 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 30 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 29 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 28 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 26 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 5 0 root Dec 14 R 152 0.00 00:00:02 0 signald 6 0 root Dec 14 R 152 0.00 00:00:03 0 kmemdaemon 17 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 16 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 15 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 14 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 13 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 12 0 root Dec 14 S 152 0.00 00:00:00 0 usbhubd 11 0 root Dec 14 R 152 0.00 00:01:11 0 escsid 10 0 root Dec 14 S -32 0.00 00:00:00 0 ttisr 9 0 root Dec 14 R 152 0.00 00:01:27 0 ksyncer_daemon

7 0]root Dec 14 R 152 0.00 00:]0:00 0 kai_daemon 50 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 47 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 44 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached 41 0 root Dec 14 S 152 0.00 00:00:00 0 net_str_cached

On Sun, Jan 17, 2010 at 4:21 PM, Josh Luthman <josh at imaginenetworksllc.com> wrote:

...
...
Is there only one client sending data as this name? I don't think you answered Lars' email.

What does the alert read and what does the data say? Missing process? Too high of a load?

Josh Luthman

...
...
Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373

"The secret to creativity is knowing how to hide your sources." --- Albert Einstein

On Sun, Jan 17, 2010 at 6:11 PM, Chris Naude <chris.naude.0 at gmail.com> wrote:

...
...
...
The problem has suddenly become much much worse. I verified with tcpdump that the data coming from the client is 100% correct. It seems something on the Xymon server side is not handling the client data correctly. Anyone have any other ideas?

...
...
...
89% /testdb3 (37771472% used) has reached the PANIC level (95%)

Filesystem 1024-blocks Used Available Capacity Mounted on /dev/vgtestdb1/lvol1 107844344 70901816 36942528 66% /testdb1 /dev/vgtestdb2/lvol1 35962064 25453128 10508936 71% /testdb2 /dev/vgtestdb4/lvol1 970909400 825006344 145903056 85% /testdb4 /dev/vgtestdb3/lv l1 ] 338788224 301016752 37771472 89% /testdb3 /dev/vgtestdb5/lvol1 179789048 150553912 29235136 84% /testdb5 /dev/vg00/lvol8 24580711 74501 24506210 1% /home /dev/vg00/lvol4 10226680 6339283 3887397 62% /opt

On Sat, Jan 16, 2010 at 10:44 AM, Chris Naude <chris.naude.0 at gmail.com> wrote:

...
...
...
>That makes a lot of sense. I did have some issues with the startup scripts on HP-UX. I'll check it out later tonight. Hopefully i can get it fixed before it goes live tonight. Thanks!

On Sat, Jan 16, 2010 at 7:56 AM, Lars Ebeling <lars.ebeling at leopg9.no-ip.org> wrote:

...
...
>>>

It looks like two instances of the client are writing to the file at the same time or almost ;) Lars ----- Original Message -----

...
>>>>>>

From: Chris Naude To: hobbit at hswn.dk Sent: Saturday, January 16, 2010 4:59 AM Subject: [hobbit] False Process Down Alerts

I'm run into a strange problem with my Xymon server. I noticed today that I'm receiving random false alerts for processes being down. When I look at the process list output in the alert it looks as if the data coming from the clients isn't correct. Here is an example. Has anyone seen anything like this?

9613 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 10389 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 9794 1 oracle 10:55:57 S 154 0.00 00:00:0 217600]oracleTEST (LOCAL=NO) 1592 1 oracle Jan 11 S 154 0.00 00:00:11 217136 ora_mman_TEST 12751 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c 8965 1944 root Jan 11 S 154 0.00 00:00:00 6128 cmclconfd -c

11819 1 oracle Jan 12 S 154 0.00 00:00:07 217280 ora_j015_TEST 2711 1 roo ]ec 4 S 120 0.04 00:02:16 868 /usr/sbin/xntpd 3547 1 xymon Dec 4 S 168 0.00 00:00:43 268 /opt/xymon/client/bin/hobbitlaunch --config=/opt/xymon/client/etc/clientlaunch.cfg --log=/opt/xymon/client/logs/clientlaunch.log --pidfile=/opt/xymon/client/logs/clientlaunch.101.example.com.pid 3728 1 root Dec 4 R 152 0.00 00:00:37 4208 /usr/sbin/stm/uut/bin/tools/monitor/WbemWrapperMonitor

Xymon version: 4.3.0-0.beta2 Xymon server: CentOS 5.4 32 bit

Client: HP-UX 11.31 Itanium

-- Chris Naude

-- Chris Naude

-- Chris Naude

-- Chris Naude

-- Chris Naude

6000

Age (days ago)

6003

Last active (days ago)

List overview

Download

10 comments

6 participants

participants (6)

chris.naude.0＠gmail.com
Doug.Williams＠rhd.com
josh＠imaginenetworksllc.com
lars.ebeling＠leopg9.no-ip.org
odinn_asgaard＠yahoo.com
Tom.Stewart＠landsend.com