I have a constant RED condition for the procs test. I checked all of my other servers and they all have just one instance of cron yet this one server has the one instance running but it come up red. These are all the same OS, Same client version. Some are stand alone servers and others are VM but this one machine just will not go Green.
Any pointers on this?
-- John J. Boris, Sr.
John
On 9 August 2016 at 04:53, john boris <jborissr at gmail.com> wrote:
I have a constant RED condition for the procs test. I checked all of my other servers and they all have just one instance of cron yet this one server has the one instance running but it come up red. These are all the same OS, Same client version. Some are stand alone servers and others are VM but this one machine just will not go Green.
Any pointers on this?
Is the "cron" message showing too many or too few cron processes?
Can you please show the message from the "procs" page that says "Processes NOT ok" including the list of monitored processes (with green/red dots) below that? If you are able, show the whole "procs" page. Also, show the relevant section from analysis.conf.
What OS are you using?
One thing about the "procs" page is that you can sometimes have unexpected PROC string matches. For instance, if you have the line:
PROC cron
then this definition will match any of these lines from the "ps" listing:
PID PPID USER STARTED S PRI %CPU TIME %MEM RSZ VSZ CMD 918 1 root Feb 22 S 22 0.0 00:00:10 0.0 776 2772 /usr/sbin/cron 2482 1 root Feb 22 S 22 0.0 00:00:00 0.0 2142 11268 /usr/bin/crontab -e 5514 1 root Feb 22 S 22 0.0 00:00:00 0.0 16484 30346 vi /etc/crontab
and you end up with more "cron" processes reported than you actually have running.
To counter this, you can use the full path of the binary in analysis.cfg, like so:
PROC /usr/sbin/cron
However, this can also match a process like "/usr/sbin/cronlog-parser" or "less /usr/sbin/cron" or even "cp /usr/sbin/cron /path/to/slow/nfsmount".
To only match where the processes name starts with the string you care about, change from a string match to a regex match like so:
PROC %^/usr/sbin/cron$
If some of your process instances might have arguments, and you can take that into account:
PROC %^/usr/sbin/snmp($|\s)
Another thing to note about the PROC monitoring is that it simply matches the strings in the "ps" output, with some assumptions about the column widths. If you happen to have column before CMD that is wider than expected, then it can push other text into the CMD column. For instance:
PID PPID USER STARTED S PRI %CPU TIME %MEM RSZ VSZ CMD 20445 20442 root Mar 20 S 23 0.0 00:00:00 0.0 4112 48044 bin/abcd 30269 11056 roddy 11:38:27 S 23 0.0 00:00:00 0.0 3288 46732 sshd 30442 30269 freddy2496 14:41:17 S 23 0.0 00:00:00 0.0 2320 46864 sshd 30460 30442 wendy 11:38:27 S 23 0.0 00:00:00 0.0 2524 14192 -bash
If I'm trying to match "sshd" then the above will match only once, because the second sshd line will have "sshd" matched against "4 sshd" (the "4" from the VSZ column) all because the username is wider than the expected maximum width for the USER field.
Cheers Jeremy
Jeremy, Thanks for the reply and because you asked the question it prompted me to look a bit deeper. First this is a SCO box so things are different. I checked and it seems on this installation the bbc user did not have the correct authorization.
Rethinking this server is a fresh vm move from standalone so I think in the restore process the authorizations might have not been set properly. I went into the Account manager and redid the authorizations for the bbc user and this allowed the bbc user to see the output of the ps command.
So all is well and lesson learned.
On Mon, Aug 8, 2016 at 10:16 PM, Jeremy Laidman <jlaidman at rebel-it.com.au> wrote:
John
On 9 August 2016 at 04:53, john boris <jborissr at gmail.com> wrote:
I have a constant RED condition for the procs test. I checked all of my other servers and they all have just one instance of cron yet this one server has the one instance running but it come up red. These are all the same OS, Same client version. Some are stand alone servers and others are VM but this one machine just will not go Green.
Any pointers on this?
Is the "cron" message showing too many or too few cron processes?
Can you please show the message from the "procs" page that says "Processes NOT ok" including the list of monitored processes (with green/red dots) below that? If you are able, show the whole "procs" page. Also, show the relevant section from analysis.conf.
What OS are you using?
One thing about the "procs" page is that you can sometimes have unexpected PROC string matches. For instance, if you have the line:
PROC cronthen this definition will match any of these lines from the "ps" listing:
PID PPID USER STARTED S PRI %CPU TIME %MEM RSZ VSZ CMD 918 1 root Feb 22 S 22 0.0 00:00:10 0.0 776 2772 /usr/sbin/cron 2482 1 root Feb 22 S 22 0.0 00:00:00 0.0 2142 11268 /usr/bin/crontab -e 5514 1 root Feb 22 S 22 0.0 00:00:00 0.0 16484 30346 vi /etc/crontab
and you end up with more "cron" processes reported than you actually have running.
To counter this, you can use the full path of the binary in analysis.cfg, like so:
PROC /usr/sbin/cronHowever, this can also match a process like "/usr/sbin/cronlog-parser" or "less /usr/sbin/cron" or even "cp /usr/sbin/cron /path/to/slow/nfsmount".
To only match where the processes name starts with the string you care about, change from a string match to a regex match like so:
PROC %^/usr/sbin/cron$If some of your process instances might have arguments, and you can take that into account:
PROC %^/usr/sbin/snmp($|\s)Another thing to note about the PROC monitoring is that it simply matches the strings in the "ps" output, with some assumptions about the column widths. If you happen to have column before CMD that is wider than expected, then it can push other text into the CMD column. For instance:
PID PPID USER STARTED S PRI %CPU TIME %MEM RSZ VSZ CMD 20445 20442 root Mar 20 S 23 0.0 00:00:00 0.0 4112 48044 bin/abcd 30269 11056 roddy 11:38:27 S 23 0.0 00:00:00 0.0 3288 46732 sshd 30442 30269 freddy2496 14:41:17 S 23 0.0 00:00:00 0.0 2320 46864 sshd 30460 30442 wendy 11:38:27 S 23 0.0 00:00:00 0.0 2524 14192 -bash
If I'm trying to match "sshd" then the above will match only once, because the second sshd line will have "sshd" matched against "4 sshd" (the "4" from the VSZ column) all because the username is wider than the expected maximum width for the USER field.
Cheers Jeremy
-- John J. Boris, Sr.
participants (2)
-
jborissr@gmail.com
-
jlaidman@rebel-it.com.au