On the mainframes, we are used to lots of tasks waiting for various resources. Main memory, virtual memory, I/O, etc all are important and can be tuned fairly well. When that tuning isn't quite right, throughput can be degraded but the pain is not as bad as when the CPU gets overloaded. CPU is one resource that is hard to change, I have run systems that needed to be less than 50% busy and others that the boss wanted at 100% and wished for 110% busy.
The CPU value can also be the first indication of a runaway user or bad database query (this is our most common problem). Once we know from the CPU utilization that something is wrong, we can look for the cause and maybe the other problems are there too.
Thomas Kern 301-903-2211
----- Original Message ----- From: vernon.everett at westernpower.com.au <vernon.everett at westernpower.com.au> To: hobbit at hswn.dk <hobbit at hswn.dk> Sent: Thu Sep 13 00:23:43 2007 Subject: Re: [hobbit] CPU utilisation alerts
It might be different in mainframe world, but in Unix world, you need to look at both the run queue length, IO stats and the CPU utilisation to get an idea of what's happening. If your CPU is at 100% and your run queue is still small, it's probably just a hefty process chugging along, like a compile. If your run queue is huge, and growing, and your CPU isn't yet at 100% you need to look at your IO. Disk, memory, swap, any resource that could be generating contention and IO wait. If there is major contention for these resources you need to look at adding more, or utilising them differently - spread data across multiple disks or mirror the disk to increase read throughput, that sort of thing. If your run queue is huge, and growing, and CPU is at 100%, while IO is low, it's probably time to move to a new server, or find the developer and tell him to fix his bugs. :-)
So absolute CPU utilisation on its own, isn't particularly meaniingful, but if that's what the PHBs want, let's give it to them.
Regards Vernon
"Kern, Thomas" <Thomas.Kern at hq.doe.gov> wrote on 13/09/2007 12:07:37 PM:
I would prefer that the cpu test be the data from the vmstat command instead of the load values. I am used to a mainframe system and cpu utilization is more useful that queue length. All of my Linux systems are guests on a mainframe system so their individual cpu utilizations is not as important as the values from my first level system and I am working on a client side test for that.
Thomas Kern 301-903-2211
----- Original Message ----- From: vernon.everett at westernpower.com.au <vernon.everett at westernpower.com.au> To: hobbit at hswn.dk <hobbit at hswn.dk> Sent: Wed Sep 12 23:56:43 2007 Subject: Re: [hobbit] CPU utilisation alerts
Hi Thomas
Thanks for your quick response.
A client side script would work, but I was thinking I cannot be the first person to need this, and that somebody else has already invented the wheel. (I hate reinventing stuff) Alternatively, I was hoping that Henrik has some magic switch or config setting that will make it work.
Regards Vernon
"Kern, Thomas" <Thomas.Kern at hq.doe.gov> wrote on 13/09/2007 11:46:07 AM:
I don't know if you can alert off one of the values in one of the trends graphs. That might take some back-end modifications.
But you could write a simple client-side script to do the same command that is parsed for the trends graphs (vmstat, I think), totaling the cpu utilization values and sending a simple status message with the appropriate g/y/r color. The hobbit can do the alert.
Thomas Kern 301-903-2211
----- Original Message ----- From: vernon.everett at westernpower.com.au <vernon. everett at westernpower.com.au> To: hobbit at hswn.dk <hobbit at hswn.dk> Sent: Wed Sep 12 23:36:07 2007 Subject: [hobbit] CPU utilisation alerts
Hi all
I'm baaaack :-) For those who might have missed me, I spent a few months contracting for a company that standardised on BMC Patrol. Wouldn't even look at Hobbit. BMC is a horrible package, expensive, not very extensible, with a huge client footprint and overhead, and is very prone to crashing. Sad product.
But no matter, I am now trying to satisfy my new company that Hobbit is the one monitor to rule them all, and my new colleagues have identified a "deficiency".
This has probably been asked and answered before, but here is whatthey want. I have been asked to generate a yellow/red status when absolute CPU utilisation reaches predetermined thresholds. Yes, I know, without looking at the run-queue this figure is not very meaningful, but this is what they want.
The la1 graph in the trends column does an excellent job of graphing the CPU utilisation, but how do I configure an alert based on that figure?
Regards Vernon
======================================================================== Electricity Networks Corporation, trading as Western Power ABN: 18 540 492 861
TO THE ADDRESSEE - this email is for the intended addressee only and may contain information that is confidential. If you have received this email in error, please notify us immediately by return email or by telephone. Please also destroy this message and any electronic or hard copies of this message.
Any claim to confidentiality is not waived or lost by reason of mistaken transmission of this email.
Unencrypted email is not secure and may not be authentic. Western Power cannot guarantee the accuracy, reliability, completeness or confidentiality of this email and any attachments.
VIRUSES - Western Power scans all outgoing emails and attachments for viruses, however it is the recipient's responsibility to ensure this email is free of viruses.
======================================================================== Electricity Networks Corporation, trading as Western Power ABN: 18 540 492 861
TO THE ADDRESSEE - this email is for the intended addressee only and may contain information that is confidential. If you have received this email in error, please notify us immediately by return email or by telephone. Please also destroy this message and any electronic or hard copies of this message.
Any claim to confidentiality is not waived or lost by reason of mistaken transmission of this email.
Unencrypted email is not secure and may not be authentic. Western Power cannot guarantee the accuracy, reliability, completeness or confidentiality of this email and any attachments.
VIRUSES - Western Power scans all outgoing emails and attachments for viruses, however it is the recipient's responsibility to ensure this email is free of viruses.
======================================================================== Electricity Networks Corporation, trading as Western Power ABN: 18 540 492 861
TO THE ADDRESSEE - this email is for the intended addressee only and may contain information that is confidential. If you have received this email in error, please notify us immediately by return email or by telephone. Please also destroy this message and any electronic or hard copies of this message.
Any claim to confidentiality is not waived or lost by reason of mistaken transmission of this email.
Unencrypted email is not secure and may not be authentic. Western Power cannot guarantee the accuracy, reliability, completeness or confidentiality of this email and any attachments.
VIRUSES - Western Power scans all outgoing emails and attachments for viruses, however it is the recipient's responsibility to ensure this email is free of viruses.
So we both want a basic CPU utilisation alert. Cool. Hopefully somebody on the list has done this before. If not, it's time to do a bit of scripting. If I have to do it myself, I will post the results, if you are interested.
Regards Vernon
"Kern, Thomas" <Thomas.Kern at hq.doe.gov> wrote on 13/09/2007 12:56:08 PM:
On the mainframes, we are used to lots of tasks waiting for various resources. Main memory, virtual memory, I/O, etc all are important and can be tuned fairly well. When that tuning isn't quite right, throughput can be degraded but the pain is not as bad as when the CPU gets overloaded. CPU is one resource that is hard to change, I have run systems that needed to be less than 50% busy and others that the boss wanted at 100% and wished for 110% busy.
The CPU value can also be the first indication of a runaway user or bad database query (this is our most common problem). Once we know from the CPU utilization that something is wrong, we can look for the cause and maybe the other problems are there too.
Thomas Kern 301-903-2211
----- Original Message ----- From: vernon.everett at westernpower.com.au <vernon.everett at westernpower.com.au> To: hobbit at hswn.dk <hobbit at hswn.dk> Sent: Thu Sep 13 00:23:43 2007 Subject: Re: [hobbit] CPU utilisation alerts
It might be different in mainframe world, but in Unix world, you need to look at both the run queue length, IO stats and the CPU utilisation to get an idea of what's happening. If your CPU is at 100% and your run queue is still small, it's probably just a hefty process chugging along, like a compile. If your run queue is huge, and growing, and your CPU isn't yet at 100% you need to look at your IO. Disk, memory, swap, any resource that could be generating contention and IO wait. If there is major contention for these resources you need to look at adding more, or utilising them differently - spread data across multiple disks or mirror the disk to increase read throughput, that sort of thing. If your run queue is huge, and growing, and CPU is at 100%, while IO is low, it's probably time to move to a new server, or find the developer and tell him to fix his bugs. :-)
So absolute CPU utilisation on its own, isn't particularly meaniingful, but if that's what the PHBs want, let's give it to them.
Regards Vernon
"Kern, Thomas" <Thomas.Kern at hq.doe.gov> wrote on 13/09/2007 12:07:37 PM:
I would prefer that the cpu test be the data from the vmstat command instead of the load values. I am used to a mainframe system and cpu utilization is more useful that queue length. All of my Linux systems are guests on a mainframe system so their individual cpu utilizations is not as important as the values from my first level system and I am working on a client side test for that.
Thomas Kern 301-903-2211
----- Original Message ----- From: vernon.everett at westernpower.com.au <vernon. everett at westernpower.com.au> To: hobbit at hswn.dk <hobbit at hswn.dk> Sent: Wed Sep 12 23:56:43 2007 Subject: Re: [hobbit] CPU utilisation alerts
Hi Thomas
Thanks for your quick response.
A client side script would work, but I was thinking I cannot be the first person to need this, and that somebody else has already invented the wheel. (I hate reinventing stuff) Alternatively, I was hoping that Henrik has some magic switch or config setting that will make it work.
Regards Vernon
"Kern, Thomas" <Thomas.Kern at hq.doe.gov> wrote on 13/09/2007 11:46:07 AM:
I don't know if you can alert off one of the values in one of the trends graphs. That might take some back-end modifications.
But you could write a simple client-side script to do the same command that is parsed for the trends graphs (vmstat, I think), totaling the cpu utilization values and sending a simple status message with the appropriate g/y/r color. The hobbit can do the alert.
Thomas Kern 301-903-2211
----- Original Message ----- From: vernon.everett at westernpower.com.au <vernon. everett at westernpower.com.au> To: hobbit at hswn.dk <hobbit at hswn.dk> Sent: Wed Sep 12 23:36:07 2007 Subject: [hobbit] CPU utilisation alerts
Hi all
I'm baaaack :-) For those who might have missed me, I spent a few months contracting for a company that standardised on BMC Patrol. Wouldn't even look at Hobbit. BMC is a horrible package, expensive, not very extensible, with a huge client footprint and overhead, and is very prone to crashing. Sad product.
But no matter, I am now trying to satisfy my new company that Hobbit is the one monitor to rule them all, and my new colleagues have identified a "deficiency".
This has probably been asked and answered before, but here is whatthey want. I have been asked to generate a yellow/red status when absolute CPU utilisation reaches predetermined thresholds. Yes, I know, without looking at the run-queue this figure is not very meaningful, but this is what they want.
The la1 graph in the trends column does an excellent job of graphing the CPU utilisation, but how do I configure an alert based on that figure?
Regards Vernon
========================================================================
Electricity Networks Corporation, trading as Western Power ABN: 18 540 492 861
TO THE ADDRESSEE - this email is for the intended addressee only and may contain information that is confidential. If you have received this email in error, please notify us immediately by return email or by telephone. Please also destroy this message and any electronic or hard copies of this message.
Any claim to confidentiality is not waived or lost by reason of mistaken transmission of this email.
Unencrypted email is not secure and may not be authentic. Western Power cannot guarantee the accuracy, reliability, completeness or confidentiality of this email and any attachments.
VIRUSES - Western Power scans all outgoing emails and attachments for viruses, however it is the recipient's responsibility to ensure this email is free of viruses.
========================================================================
========================================================================
Electricity Networks Corporation, trading as Western Power ABN: 18 540 492 861
TO THE ADDRESSEE - this email is for the intended addressee only and may contain information that is confidential. If you have received this email in error, please notify us immediately by return email or by telephone. Please also destroy this message and any electronic or hard copies of this message.
Any claim to confidentiality is not waived or lost by reason of mistaken transmission of this email.
Unencrypted email is not secure and may not be authentic. Western Power cannot guarantee the accuracy, reliability, completeness or confidentiality of this email and any attachments.
VIRUSES - Western Power scans all outgoing emails and attachments for viruses, however it is the recipient's responsibility to ensure this email is free of viruses.
========================================================================
======================================================================== Electricity Networks Corporation, trading as Western Power ABN: 18 540 492 861
TO THE ADDRESSEE - this email is for the intended addressee only and may contain information that is confidential. If you have received this email in error, please notify us immediately by return email or by telephone. Please also destroy this message and any electronic or hard copies of this message.
Any claim to confidentiality is not waived or lost by reason of mistaken transmission of this email.
Unencrypted email is not secure and may not be authentic. Western Power cannot guarantee the accuracy, reliability, completeness or confidentiality of this email and any attachments.
VIRUSES - Western Power scans all outgoing emails and attachments for viruses, however it is the recipient's responsibility to ensure this email is free of viruses.
======================================================================== Electricity Networks Corporation, trading as Western Power ABN: 18 540 492 861
TO THE ADDRESSEE - this email is for the intended addressee only and may contain information that is confidential. If you have received this email in error, please notify us immediately by return email or by telephone. Please also destroy this message and any electronic or hard copies of this message.
Any claim to confidentiality is not waived or lost by reason of mistaken transmission of this email.
Unencrypted email is not secure and may not be authentic. Western Power cannot guarantee the accuracy, reliability, completeness or confidentiality of this email and any attachments.
VIRUSES - Western Power scans all outgoing emails and attachments for viruses, however it is the recipient's responsibility to ensure this email is free of viruses.
Hi Henrik
I have been thinking about this problem, and was wondering how difficult it would be to incorporate a CPUU (CPU Utilisation) test as a standard for the bb-hosts config. The graph already exists (la1) so the data is already being collected. What would be needed to make it a standard test?
Regards Vernon
Vernon Everett/PER/Western_Power at Western_Power wrote on 13/09/2007 01:05:46 PM:
So we both want a basic CPU utilisation alert. Cool. Hopefully somebody on the list has done this before. If not, it's time to do a bit of scripting. If I have to do it myself, I will post the results, if you are
interested.
Regards Vernon
"Kern, Thomas" <Thomas.Kern at hq.doe.gov> wrote on 13/09/2007 12:56:08 PM:
On the mainframes, we are used to lots of tasks waiting for various resources. Main memory, virtual memory, I/O, etc all are important and can be tuned fairly well. When that tuning isn't quite right, throughput can be degraded but the pain is not as bad as when the CPU gets overloaded. CPU is one resource that is hard to change, I have run systems that needed to be less than 50% busy and others that the boss wanted at 100% and wished for 110% busy.
The CPU value can also be the first indication of a runaway user or bad database query (this is our most common problem). Once we know from the CPU utilization that something is wrong, we can look for the cause and maybe the other problems are there too.
Thomas Kern 301-903-2211
----- Original Message ----- From: vernon.everett at westernpower.com.au <vernon. everett at westernpower.com.au> To: hobbit at hswn.dk <hobbit at hswn.dk> Sent: Thu Sep 13 00:23:43 2007 Subject: Re: [hobbit] CPU utilisation alerts
It might be different in mainframe world, but in Unix world, you need to look at both the run queue length, IO stats and the CPU utilisation to get an idea of what's happening. If your CPU is at 100% and your run queue is still small, it's probably just a hefty process chugging along, like a compile. If your run queue is huge, and growing, and your CPU isn't yet at 100% you need to look at your IO. Disk, memory, swap, any resource that could be generating contention and IO wait. If there is major contention for these resources you need to look at adding more, or utilising them differently - spread data across multiple disks or mirror the disk to increase read throughput, that sort of thing. If your run queue is huge, and growing, and CPU is at 100%, while IO is low, it's probably time to move to a new server, or find the developer and tell him to fix his bugs. :-)
So absolute CPU utilisation on its own, isn't particularly meaniingful, but if that's what the PHBs want, let's give it to them.
Regards Vernon
"Kern, Thomas" <Thomas.Kern at hq.doe.gov> wrote on 13/09/2007 12:07:37
PM:
I would prefer that the cpu test be the data from the vmstat command instead of the load values. I am used to a mainframe system and cpu utilization is more useful that queue length. All of my Linux systems are guests on a mainframe system so their individual cpu utilizations is not as important as the values from my first level system and I am working on a client side test for that.
Thomas Kern 301-903-2211
----- Original Message ----- From: vernon.everett at westernpower.com.au <vernon. everett at westernpower.com.au> To: hobbit at hswn.dk <hobbit at hswn.dk> Sent: Wed Sep 12 23:56:43 2007 Subject: Re: [hobbit] CPU utilisation alerts
Hi Thomas
Thanks for your quick response.
A client side script would work, but I was thinking I cannot be the first person to need this, and that somebody else has already invented the wheel. (I hate reinventing stuff) Alternatively, I was hoping that Henrik has some magic switch or config setting that will make it work.
Regards Vernon
"Kern, Thomas" <Thomas.Kern at hq.doe.gov> wrote on 13/09/2007 11:46:07
AM:
I don't know if you can alert off one of the values in one of the trends graphs. That might take some back-end modifications.
But you could write a simple client-side script to do the same command that is parsed for the trends graphs (vmstat, I think), totaling the cpu utilization values and sending a simple status message with the appropriate g/y/r color. The hobbit can do the
alert.
Thomas Kern 301-903-2211
----- Original Message ----- From: vernon.everett at westernpower.com.au <vernon.
everett at westernpower.com.au>
To: hobbit at hswn.dk <hobbit at hswn.dk> Sent: Wed Sep 12 23:36:07 2007 Subject: [hobbit] CPU utilisation alerts
Hi all
I'm baaaack :-) For those who might have missed me, I spent a few months contracting for a company that standardised on BMC Patrol. Wouldn't even look at Hobbit. BMC is a horrible package, expensive, not very extensible, with a huge client footprint and overhead, and is very prone to crashing. Sad product.
But no matter, I am now trying to satisfy my new company that Hobbit is the one monitor to rule them all, and my new colleagues have identified a "deficiency".
This has probably been asked and answered before, but here is whatthey want. I have been asked to generate a yellow/red status when absolute CPU utilisation reaches predetermined thresholds. Yes, I know, without looking at the run-queue this figure is not very meaningful, but this is what they want.
The la1 graph in the trends column does an excellent job of graphing the CPU utilisation, but how do I configure an alert based on that figure?
Regards Vernon
========================================================================
Electricity Networks Corporation, trading as Western Power ABN: 18 540 492 861
TO THE ADDRESSEE - this email is for the intended addressee only and may contain information that is confidential. If you have received this email in error, please notify us immediately by return email or by telephone. Please also destroy this message and any electronic or hard copies of this message.
Any claim to confidentiality is not waived or lost by reason of mistaken transmission of this email.
Unencrypted email is not secure and may not be authentic. Western Power cannot guarantee the accuracy, reliability, completeness or confidentiality of this email and any attachments.
VIRUSES - Western Power scans all outgoing emails and attachments for viruses, however it is the recipient's responsibility to ensure this email is free of viruses.
========================================================================
========================================================================
Electricity Networks Corporation, trading as Western Power ABN: 18 540 492 861
TO THE ADDRESSEE - this email is for the intended addressee only and may contain information that is confidential. If you have received this email in error, please notify us immediately by return email or by telephone. Please also destroy this message and any electronic or hard copies of this message.
Any claim to confidentiality is not waived or lost by reason of mistaken transmission of this email.
Unencrypted email is not secure and may not be authentic. Western Power cannot guarantee the accuracy, reliability, completeness or confidentiality of this email and any attachments.
VIRUSES - Western Power scans all outgoing emails and attachments for viruses, however it is the recipient's responsibility to ensure this email is free of viruses.
========================================================================
========================================================================
Electricity Networks Corporation, trading as Western Power ABN: 18 540 492 861
TO THE ADDRESSEE - this email is for the intended addressee only and may contain information that is confidential. If you have received this email in error, please notify us immediately by return email or by telephone. Please also destroy this message and any electronic or hard copies of this message.
Any claim to confidentiality is not waived or lost by reason of mistaken transmission of this email.
Unencrypted email is not secure and may not be authentic. Western Power cannot guarantee the accuracy, reliability, completeness or confidentiality of this email and any attachments.
VIRUSES - Western Power scans all outgoing emails and attachments for viruses, however it is the recipient's responsibility to ensure this email is free of viruses.
========================================================================
======================================================================== Electricity Networks Corporation, trading as Western Power ABN: 18 540 492 861
TO THE ADDRESSEE - this email is for the intended addressee only and may contain information that is confidential. If you have received this email in error, please notify us immediately by return email or by telephone. Please also destroy this message and any electronic or hard copies of this message.
Any claim to confidentiality is not waived or lost by reason of mistaken transmission of this email.
Unencrypted email is not secure and may not be authentic. Western Power cannot guarantee the accuracy, reliability, completeness or confidentiality of this email and any attachments.
VIRUSES - Western Power scans all outgoing emails and attachments for viruses, however it is the recipient's responsibility to ensure this email is free of viruses.
======================================================================== Electricity Networks Corporation, trading as Western Power ABN: 18 540 492 861
TO THE ADDRESSEE - this email is for the intended addressee only and may contain information that is confidential. If you have received this email in error, please notify us immediately by return email or by telephone. Please also destroy this message and any electronic or hard copies of this message.
Any claim to confidentiality is not waived or lost by reason of mistaken transmission of this email.
Unencrypted email is not secure and may not be authentic. Western Power cannot guarantee the accuracy, reliability, completeness or confidentiality of this email and any attachments.
VIRUSES - Western Power scans all outgoing emails and attachments for viruses, however it is the recipient's responsibility to ensure this email is free of viruses.
On Thu, Sep 13, 2007 at 01:39:37PM +0800, vernon.everett at westernpower.com.au wrote:
I have been thinking about this problem, and was wondering how difficult it would be to incorporate a CPUU (CPU Utilisation) test as a standard for the bb-hosts config. The graph already exists (la1) so the data is already being collected. What would be needed to make it a standard test?
I have a long-term goal which means that all of the data going into graphs can be used to modify statuses, so if vmstat begins to report a high cpu utilisation, this could trigger the "cpu" status to go read. Or if swap I/O goes up, it would trigger a change of the "memory" status.
But that would require some re-design of how statuses are handled inside Hobbit, so it's not an easy fix.
So for the short term we could have the client server-module look at vmstat data also when it generates the cpu status. I'll have a look at it.
Henrik
participants (3)
-
henrik@hswn.dk
-
Thomas.Kern@hq.doe.gov
-
vernon.everett@westernpower.com.au