I have a linux server which is alerting on a high load but the load average is lower than my threshold. My question, why is it going red?
Analysis.cfg HOST=serverA LOAD 89.0 90.0
Here are the top results - I expect that the alert should be triggered by the load average of 21.75 which is far lower than the thresholds. top - 07:58:55 up 17 days, 19:31, 20 users, load average: 21.75, 25.16, 25.32 ...
But, there is a single process using a ton of cpu on one of the multiple cores - is this factoring into the alert? If so, why is it not documented? PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 121978 user1+ 20 0 46.839g 0.040t 50388 S 2774 17.2 18504:34 python
Thanks, John Upcoming PTO:
John Rothlisberger IT Strategy, Infrastructure & Security - Technology Growth Platform TGP for Business Process Outsourcing Accenture 312.693.3136 office
This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
www.accenture.com
Any chance you have another entry that could be overriding that setting? Or maybe it's not matching he entry and falling aback to default?
=G=
On Thu, Mar 8, 2018 at 8:24 AM, Rothlisberger, John R. < john.r.rothlisberger at accenture.com> wrote:
I have a linux server which is alerting on a high load but the load average is lower than my threshold. My question, why is it going red?
Analysis.cfg
HOST=serverA
LOAD 89.0 90.0Here are the top results – I expect that the alert should be triggered by the load average of 21.75 which is far lower than the thresholds.
top - 07:58:55 up 17 days, 19:31, 20 users, load average: 21.75, 25.16, 25.32
...
But, there is a single process using a ton of cpu on one of the multiple cores – is this factoring into the alert? If so, why is it not documented?
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
121978 user1+ 20 0 46.839g 0.040t 50388 S 2774 17.2 18504:34 python
Thanks,
John
Upcoming PTO:
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing
Accenture
312.693.3136 <(312)%20693-3136> office
This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
www.accenture.com
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
John,
You can test that with:
./bin/xymoncmd xymond_alert --test serverA load
And confirm that the alerts.cfg line you think is handling it really is.
Also the LOAD check is looking at the 5 minute load. Not the 1 minute. So in your example it is triggering at 25.16. Which still isn’t the level you wanted.
Larry
From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of Galen Johnson Sent: Thursday, March 8, 2018 9:12 AM To: Rothlisberger, John R. Cc: xymon >> xymon at xymon.com Subject: Re: [Xymon] Linux load question
Any chance you have another entry that could be overriding that setting? Or maybe it's not matching he entry and falling aback to default? =G=
On Thu, Mar 8, 2018 at 8:24 AM, Rothlisberger, John R. <john.r.rothlisberger at accenture.com<mailto:john.r.rothlisberger at accenture.com>> wrote: I have a linux server which is alerting on a high load but the load average is lower than my threshold. My question, why is it going red?
Analysis.cfg HOST=serverA LOAD 89.0 90.0
Here are the top results – I expect that the alert should be triggered by the load average of 21.75 which is far lower than the thresholds. top - 07:58:55 up 17 days, 19:31, 20 users, load average: 21.75, 25.16, 25.32 ...
But, there is a single process using a ton of cpu on one of the multiple cores – is this factoring into the alert? If so, why is it not documented? PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 121978 user1+ 20 0 46.839g 0.040t 50388 S 2774 17.2 18504:34 python
Thanks, John Upcoming PTO:
John Rothlisberger IT Strategy, Infrastructure & Security - Technology Growth Platform TGP for Business Process Outsourcing Accenture 312.693.3136 office
This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
www.accenture.com<http://www.accenture.com>
Xymon mailing list Xymon at xymon.com<mailto:Xymon at xymon.com> http://lists.xymon.com/mailman/listinfo/xymon
CONFIDENTIALITY NOTICE: This electronic mail message is intended exclusively for recipient to which it is addressed. The contents of this message and any attachments may contain confidential and privileged information. Any unauthorized review, use, print, storage, copy, disclosure or distribution is strictly prohibited. If you have received this message in error, please advise the sender immediately by replying to the message's sender and delete all copies of this message and its attachments without disclosing the contents to anyone, or using the contents for any purpose.
Defaults are similar (not below 25 anyway).
So, it’s it because there are multiple cpu’s? Is there some setting that I could use?
Or, is there a way to find the exact value/setting that Xymon is using to change this to red? The CPU graph shows that the total cpu % doesn’t really go above 30%.
Thanks, John
From: Galen Johnson [mailto:solitaryr at gmail.com] Sent: Thursday, March 8, 2018 9:12 AM To: Rothlisberger, John R. <john.r.rothlisberger at accenture.com> Cc: xymon >> xymon at xymon.com <xymon at xymon.com> Subject: [External] Re: [Xymon] Linux load question
Any chance you have another entry that could be overriding that setting? Or maybe it's not matching he entry and falling aback to default? =G=
On Thu, Mar 8, 2018 at 8:24 AM, Rothlisberger, John R. <john.r.rothlisberger at accenture.com<mailto:john.r.rothlisberger at accenture.com>> wrote: I have a linux server which is alerting on a high load but the load average is lower than my threshold. My question, why is it going red?
Analysis.cfg HOST=serverA LOAD 89.0 90.0
Here are the top results – I expect that the alert should be triggered by the load average of 21.75 which is far lower than the thresholds. top - 07:58:55 up 17 days, 19:31, 20 users, load average: 21.75, 25.16, 25.32 ...
But, there is a single process using a ton of cpu on one of the multiple cores – is this factoring into the alert? If so, why is it not documented? PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 121978 user1+ 20 0 46.839g 0.040t 50388 S 2774 17.2 18504:34 python
Thanks, John Upcoming PTO:
John Rothlisberger IT Strategy, Infrastructure & Security - Technology Growth Platform TGP for Business Process Outsourcing Accenture 312.693.3136<tel:(312)%20693-3136> office
This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
www.accenture.com<http://www.accenture.com>
Xymon mailing list Xymon at xymon.com<mailto:Xymon at xymon.com> http://lists.xymon.com/mailman/listinfo/xymon<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.xymon.com_mailman_listinfo_xymon&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=u6KtIBCRNAeN-AbgJjdZe5zZJVFEfq04dnWD-hYNPL_fxJIIFncbL8W6k0NMJtuq&m=nPb4F0YxwL3y_9ixuqS-zT1T9JME5nt2RQyOnIjlXo4&s=sUOhIPd6_ZFkbXlaIALnrXBaieOgc4ROg3cUMd6PYkk&e=>
participants (3)
-
john.r.rothlisberger@accenture.com
-
larry@fni-stl.com
-
solitaryr@gmail.com