Checking process longevity
Hi all
I'm trying to work out how I can get hobbit to alert me when a process has existed for more than n seconds. This is important as we sometimes have NFS problems that cause processes such as df to hang due to stale mounts and I'd like tp pick these up sooner rather than later.
Any ideas on how to implement this?
Thanks
CC
NOTICE: This email and any attachments are confidential. They may contain legally privileged information or copyright material. You must not read, copy, use or disclose them without authorisation. If you are not an intended recipient, please contact us at once by return email and then delete both messages and all attachments.
On Mon, Feb 04, 2008 at 08:20:42AM +0900, Coe, Colin C. (Unix Engineer) wrote:
I'm trying to work out how I can get hobbit to alert me when a process has existed for more than n seconds. This is important as we sometimes have NFS problems that cause processes such as df to hang due to stale mounts and I'd like tp pick these up sooner rather than later.
Wouldn't it be easier to just scan the logfile for NFS timeout errors?
Hobbit doesn't track the lifetime of a process, and I would think this would be very bothersome to setup because you'd have to exclude long- living daemon processes.
Regards, Henrik
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: Monday, 4 February 2008 6:18 PM To: hobbit at hswn.dk Subject: Re: [hobbit] Checking process longevity
On Mon, Feb 04, 2008 at 08:20:42AM +0900, Coe, Colin C. (Unix Engineer) wrote:
I'm trying to work out how I can get hobbit to alert me when a process has existed for more than n seconds. This is important as we sometimes have NFS problems that cause processes such as df to hang due to stale mounts and I'd like tp pick these up sooner rather than later.
Wouldn't it be easier to just scan the logfile for NFS timeout errors?
Hobbit doesn't track the lifetime of a process, and I would think this would be very bothersome to setup because you'd have to exclude long- living daemon processes.
Regards, Henrik
By default, under RHEL (most of) the files under /var/log are owned by, and only readable by, root. I'm still deciding whether or not to allow hobbit to read the log files. I do think that there are other cases where monitoring how long a process exists is useful.
I was thinking that this could be done by adding a new flag to 'PROC' in hobbit-clients.cfg. Something like:
PROC processname minimumcount maximumcount color [TRACK=id] [TEXT=text] [RUNTIME=seconds]
Example, alert if a 'df' has existed for more 60 seconds
HOST foo PROC df RUNTIME=60
I started hacking but my C fu is weak.
CC
NOTICE: This email and any attachments are confidential. They may contain legally privileged information or copyright material. You must not read, copy, use or disclose them without authorisation. If you are not an intended recipient, please contact us at once by return email and then delete both messages and all attachments.
On Tue, Feb 05, 2008 at 01:47:07PM +0900, Coe, Colin C. (Unix Engineer) wrote:
I do think that there are other cases where monitoring how long a process exists is useful.
I was thinking that this could be done by adding a new flag to 'PROC' in hobbit-clients.cfg. Something like:
PROC processname minimumcount maximumcount color [TRACK=id] [TEXT=text] [RUNTIME=seconds]
Example, alert if a 'df' has existed for more 60 seconds
HOST foo PROC df RUNTIME=60
Sure. Only problem is: How do you determine how long a process has existed ?
Some systems report the start-time of a process in a separate column (START in Linux, STIME in Solaris, ...) Not very accurate, since if they were started more than 24 hours ago it shows only the date. I guess we could use that.
Regards, Henrik
Hi all,
what is the right way to use CLASS tag in hobbit-alerts.cfg? I didn't find anything in Hobbit man-pages.
Regards.
Massimo Morsiani Information Technology Dept.
Gilbarco S.p.a. via de' Cattani, 220/G 50145 Firenze, Italy tel: +39-055-30941 fax: +39-055-318603 email: massimo.morsiani at gilbarco.com web: http://www.gilbarco.it
This message (including any attachments) contains confidential
and/or proprietary information intended only for the addressee.
Any unauthorized disclosure, copying, distribution or reliance on
the contents of this information is strictly prohibited and may
constitute a violation of law. If you are not the intended
recipient, please notify the sender immediately by responding to
this e-mail, and delete the message from your system. If you
have any questions about this e-mail please notify the sender
immediately.
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: Tuesday, 5 February 2008 8:58 PM To: hobbit at hswn.dk Subject: Re: [hobbit] Checking process longevity
On Tue, Feb 05, 2008 at 01:47:07PM +0900, Coe, Colin C. (Unix Engineer) wrote:
I do think that there are other cases where monitoring how long a process exists is useful.
I was thinking that this could be done by adding a new flag to 'PROC' in hobbit-clients.cfg. Something like:
PROC processname minimumcount maximumcount color [TRACK=id] [TEXT=text] [RUNTIME=seconds]
Example, alert if a 'df' has existed for more 60 seconds
HOST foo PROC df RUNTIME=60
Sure. Only problem is: How do you determine how long a process has existed ?
Some systems report the start-time of a process in a separate column (START in Linux, STIME in Solaris, ...) Not very accurate, since if they were started more than 24 hours ago it shows only the date. I guess we could use that.
Regards, Henrik
That sounds great. Typically, I'm looking for processes existing for 5 minutes.
Thanks
CC
NOTICE: This email and any attachments are confidential. They may contain legally privileged information or copyright material. You must not read, copy, use or disclose them without authorisation. If you are not an intended recipient, please contact us at once by return email and then delete both messages and all attachments.
On Tue, Feb 5, 2008 at 7:58 AM, Henrik Stoerner <henrik at hswn.dk> wrote:
On Tue, Feb 05, 2008 at 01:47:07PM +0900, Coe, Colin C. (Unix Engineer) wrote:
I do think that there are other cases where monitoring how long a process exists is useful.
I was thinking that this could be done by adding a new flag to 'PROC' in hobbit-clients.cfg. Something like:
PROC processname minimumcount maximumcount color [TRACK=id] [TEXT=text] [RUNTIME=seconds]
Example, alert if a 'df' has existed for more 60 seconds
HOST foo>
PROC df RUNTIME=60Sure. Only problem is: How do you determine how long a process has existed ?
Some systems report the start-time of a process in a separate column (START in Linux, STIME in Solaris, ...) Not very accurate, since if they were started more than 24 hours ago it shows only the date. I guess we could use that.
Regards, Henrik
If etime was added to ps command this could be added to Solaris and Linux for this purpose. stime seems like it would report month day or year depending on OS and time passed.
Solaris man for ps: etime In the POSIX locale, the elapsed time since the pro- cess was started, in the form: [[dd-]hh:]mm:ss
Example output for different times. STIME ELAPSED Mar18 2-03:45:15 Mar19 1-06:56:27 14:45 02:08:04 15:33 01:20:22 16:25 28:27 16:53 00:29 16:53 00:00
SunOS 5.7 STIME ELAPSED May_04 686-03:00:28
SunOS 5.9 STIME ELAPSED Mar_08 377-21:30:03
SunOS 5.10 STIME ELAPSED Jun_12 282-01:43:14
Red Hat Enterprise Linux AS release 3 Linux 2.4.21-47.ELsmp STIME ELAPSED Mar15 5-02:51:34
Red Hat Enterprise Linux AS release 4 Linux 2.6.9-34.ELsmp STIME ELAPSED 2007 203-00:13:39
I found etime today because I had to prove processes started on a Solaris system on Feb_20 of 2007 and not Feb_20 2008.
Hope this is helpful.
John
participants (4)
-
Colin.Coe@woodside.com.au
-
henrik@hswn.dk
-
jg2727@gmail.com
-
massimo.morsiani@gilbarco.com