[hobbit] Highlights of the 4.3.0 version
On Tuesday 24 July 2007 22:55:02 Hubbard, Greg L wrote:
Wonder if there is any way to tell a client what it's status is so it can be autonomous? What I mean is this: suppose there was a way for the Hobbit client to tell the server that service X was now in state Y, and a client-side module could then activate response Z on its own?
I don't like band-aids like this.
"restart because it's down" prevents the real impact of problems being seen, and provides less motivation for fixing things properly. Instead, you sit with frequent short outages (which may avoid the attention of managers, production managers) which have end-user impact.
I like even less using a monitoring system to do this ...
Regards, Buchan
In my experience, I have to agree. Hobbit is for monitoring so the information that x is down gets to people who can properly diagnose what is going on, not take generic actions. If generic actions were something that were required for X to function properly, it should be a feature of that software.
Hobbit CAN do some scripting based on alerts, but even that might be a bit more than a systems administrator wants to hinder himself with.
Tod Hansmann Network Engineer
-----Original Message----- From: Buchan Milne [mailto:bgmilne at staff.telkomsa.net] Sent: Friday, August 03, 2007 12:31 AM To: hobbit at hswn.dk Cc: Hubbard, Greg L Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Tuesday 24 July 2007 22:55:02 Hubbard, Greg L wrote:
Wonder if there is any way to tell a client what it's status is so it can be autonomous? What I mean is this: suppose there was a way for the Hobbit client to tell the server that service X was now in state Y, and a client-side module could then activate response Z on its own?
I don't like band-aids like this.
"restart because it's down" prevents the real impact of problems being seen, and provides less motivation for fixing things properly. Instead, you sit with frequent short outages (which may avoid the attention of managers, production managers) which have end-user impact.
I like even less using a monitoring system to do this ...
Regards, Buchan
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
DOn't forget...this is the model that Tivoli and HP Openview, and many other commercial monitoring solutions provide and sell as a feature.
From my experience as a sys admin, I've alwys found that automatically restarting a service if it goes down to be "a bad thing"(TM).
In many solutions, logs get overwritten upon a restart that would be integral to the real resolution and prevention.
=G=
-----Original Message----- From: Tod Hansmann [mailto:thansmann at directpointe.com] Sent: Friday, August 03, 2007 10:40 AM To: hobbit at hswn.dk Subject: RE: [hobbit] Highlights of the 4.3.0 version
In my experience, I have to agree. Hobbit is for monitoring so the information that x is down gets to people who can properly diagnose what is going on, not take generic actions. If generic actions were something that were required for X to function properly, it should be a feature of that software.
Hobbit CAN do some scripting based on alerts, but even that might be a bit more than a systems administrator wants to hinder himself with.
Tod Hansmann Network Engineer
-----Original Message----- From: Buchan Milne [mailto:bgmilne at staff.telkomsa.net] Sent: Friday, August 03, 2007 12:31 AM To: hobbit at hswn.dk Cc: Hubbard, Greg L Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Tuesday 24 July 2007 22:55:02 Hubbard, Greg L wrote:
Wonder if there is any way to tell a client what it's status is so it can be autonomous? What I mean is this: suppose there was a way for the Hobbit client to tell the server that service X was now in state Y, and a client-side module could then activate response Z on its own?
I don't like band-aids like this.
"restart because it's down" prevents the real impact of problems being seen, and provides less motivation for fixing things properly. Instead, you sit with frequent short outages (which may avoid the attention of managers, production managers) which have end-user impact.
I like even less using a monitoring system to do this ...
Regards, Buchan
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
When a monitoring system detects something wrong, the only actions I want the monitor to perform is to get the admin (or the admin's boss) moving to diagnose and fix the problem.
And I am the admin that I am most concerned with. I don't understand most of the errors well enough to automate a recovery process.
/Thomas Kern /301-903-2211
-----Original Message----- From: Galen Johnson [mailto:Galen.Johnson at sas.com] Sent: Friday, August 03, 2007 11:18 AM To: hobbit at hswn.dk Subject: RE: [hobbit] Highlights of the 4.3.0 version
DOn't forget...this is the model that Tivoli and HP Openview, and many other commercial monitoring solutions provide and sell as a feature. From my experience as a sys admin, I've alwys found that automatically restarting a service if it goes down to be "a bad thing"(TM).
In many solutions, logs get overwritten upon a restart that would be integral to the real resolution and prevention.
=G=
Well, I use Netcool which has the opposite philosophy -- there is a "process automation" system that watches processes and restarts them if they fail, while also logging restarts. You can configure a "restart" parameter to be anything from 0 (forever) to any number of times. I like to set a reasonable number so persistent errors eventually kill the process, but occasional errors do not. Log files are not overwritten, but are appended and rotated.
But whatever. My view seems to be in the minority -- guess the rest of you don't mind 24x7x365 babysitting.
GLH
-----Original Message----- From: Galen Johnson [mailto:Galen.Johnson at sas.com] Sent: Friday, August 03, 2007 10:18 AM To: hobbit at hswn.dk Subject: RE: [hobbit] Highlights of the 4.3.0 version
DOn't forget...this is the model that Tivoli and HP Openview, and many other commercial monitoring solutions provide and sell as a feature.
From my experience as a sys admin, I've alwys found that automatically restarting a service if it goes down to be "a bad thing"(TM).
In many solutions, logs get overwritten upon a restart that would be integral to the real resolution and prevention.
=G=
-----Original Message----- From: Tod Hansmann [mailto:thansmann at directpointe.com] Sent: Friday, August 03, 2007 10:40 AM To: hobbit at hswn.dk Subject: RE: [hobbit] Highlights of the 4.3.0 version
In my experience, I have to agree. Hobbit is for monitoring so the information that x is down gets to people who can properly diagnose what is going on, not take generic actions. If generic actions were something that were required for X to function properly, it should be a feature of that software.
Hobbit CAN do some scripting based on alerts, but even that might be a bit more than a systems administrator wants to hinder himself with.
Tod Hansmann Network Engineer
-----Original Message----- From: Buchan Milne [mailto:bgmilne at staff.telkomsa.net] Sent: Friday, August 03, 2007 12:31 AM To: hobbit at hswn.dk Cc: Hubbard, Greg L Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Tuesday 24 July 2007 22:55:02 Hubbard, Greg L wrote:
Wonder if there is any way to tell a client what it's status is so it can be autonomous? What I mean is this: suppose there was a way for the Hobbit client to tell the server that service X was now in state Y, and a client-side module could then activate response Z on its own?
I don't like band-aids like this.
"restart because it's down" prevents the real impact of problems being seen, and provides less motivation for fixing things properly. Instead, you sit with frequent short outages (which may avoid the attention of managers, production managers) which have end-user impact.
I like even less using a monitoring system to do this ...
Regards, Buchan
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Friday 03 August 2007 11:38, Hubbard, Greg L wrote:
Well, I use Netcool which has the opposite philosophy -- there is a "process automation" system that watches processes and restarts them if they fail, while also logging restarts. You can configure a "restart" parameter to be anything from 0 (forever) to any number of times. I like to set a reasonable number so persistent errors eventually kill the process, but occasional errors do not. Log files are not overwritten, but are appended and rotated.
But whatever. My view seems to be in the minority -- guess the rest of you don't mind 24x7x365 babysitting.
GLH
To restart a process, some form of intelligence has to be added to the restart script, especially when recovering from a failure mode. Scripts can only have so much intelligence, a restart script could be dangerous unless dealing with a simple situation.
Now after saying all this, I do have to admit I do have scripts that query the status of the monitoring server and on reds perform a restart. There should be nothing stopping you from implementing the same. It is just a very fine line when deciding when/how to implement process restarts.
Most times out of not, it is much better for a person to react to an alert then a script. But for recurring failure modes, these scripts do help and I don't get called at 3 am.
So if you really need to implement restart scripts, just use the bb tool's query feature.
~Steve
I am definitely in the "monitor only" camp. As appealing as "self-healing" may seem, I've seen attempts go horrible wrong too many times. For example, shutting down Oracle for upgrades and then being restarted in the middle of the upgrade. Not good.
I also agree that "self-healing" lends itself to band-aids that avoid root-cause determination. I don't think this requires "baby-sitting," but a commitment to fixing things once. I have also had the displeasure of making permanent band-aids, but I cannot condone it.
All of those "operational" aspects aside, I've convinced myself from a security point of view, corrective action from monitoring is bad-- a clear violation of the separation of duties. You don't want your auditors "cleaning up" the numbers as they go over your books.
You know what's better than your webserver being automatically restarted when it crashes? Your webserver not crashing.
I completely support the absence of corrective actions from monitor triggers. The question I have yet to answer satisfactorily is,"Should the monitoring system perform additional data collection after specific errors?" For example, running a particular "find" command when disk usage increases to try and identify which files are causing the partition to fill.
Scott Walters -PacketPusher
On 8/3/07, Hubbard, Greg L <greg.hubbard at eds.com> wrote:
Well, I use Netcool which has the opposite philosophy -- there is a "process automation" system that watches processes and restarts them if they fail, while also logging restarts. You can configure a "restart" parameter to be anything from 0 (forever) to any number of times. I like to set a reasonable number so persistent errors eventually kill the process, but occasional errors do not. Log files are not overwritten, but are appended and rotated.
But whatever. My view seems to be in the minority -- guess the rest of you don't mind 24x7x365 babysitting.
GLH
-----Original Message----- From: Galen Johnson [mailto:Galen.Johnson at sas.com] Sent: Friday, August 03, 2007 10:18 AM To: hobbit at hswn.dk Subject: RE: [hobbit] Highlights of the 4.3.0 version
DOn't forget...this is the model that Tivoli and HP Openview, and many other commercial monitoring solutions provide and sell as a feature. From my experience as a sys admin, I've alwys found that automatically restarting a service if it goes down to be "a bad thing"(TM).
In many solutions, logs get overwritten upon a restart that would be integral to the real resolution and prevention.
=G=
-----Original Message----- From: Tod Hansmann [mailto:thansmann at directpointe.com] Sent: Friday, August 03, 2007 10:40 AM To: hobbit at hswn.dk Subject: RE: [hobbit] Highlights of the 4.3.0 version
In my experience, I have to agree. Hobbit is for monitoring so the information that x is down gets to people who can properly diagnose what is going on, not take generic actions. If generic actions were something that were required for X to function properly, it should be a feature of that software.
Hobbit CAN do some scripting based on alerts, but even that might be a bit more than a systems administrator wants to hinder himself with.
Tod Hansmann Network Engineer
-----Original Message----- From: Buchan Milne [mailto:bgmilne at staff.telkomsa.net] Sent: Friday, August 03, 2007 12:31 AM To: hobbit at hswn.dk Cc: Hubbard, Greg L Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Tuesday 24 July 2007 22:55:02 Hubbard, Greg L wrote:
Wonder if there is any way to tell a client what it's status is so it can be autonomous? What I mean is this: suppose there was a way for the Hobbit client to tell the server that service X was now in state Y, and a client-side module could then activate response Z on its own?
I don't like band-aids like this.
"restart because it's down" prevents the real impact of problems being seen, and provides less motivation for fixing things properly. Instead, you sit with frequent short outages (which may avoid the attention of managers, production managers) which have end-user impact.
I like even less using a monitoring system to do this ...
Regards, Buchan
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Fri, Aug 03, 2007 at 01:15:27PM -0400, Scott Walters wrote:
I am definitely in the "monitor only" camp.
Me too. For those who feel differently, Hobbit does provide the necessary hooks so you can trigger actions from some status going red; either through alert scripts, or from the bb "query" command which others have mentioned. In fact, I implemented the "query" feature because I needed it to setup such an automated recovery for one of our customers at work.
All of those "operational" aspects aside, I've convinced myself from a security point of view, corrective action from monitoring is bad-- a clear violation of the separation of duties. You don't want your auditors "cleaning up" the numbers as they go over your books.
Good point.
The question I have yet to answer satisfactorily is,"Should the monitoring system perform additional data collection after specific errors?" For example, running a particular "find" command when disk usage increases to try and identify which files are causing the partition to fill.
It can be very useful at times, especially when you have to do a "root cause analysis" to explain why some service was down at 2 AM in the morning - and the problem was fixed by a 2nd-level technician who just rebooted the box. That's why I added the feature that Hobbit saves the latest client-data report when a status goes yellow or red. It has helped me track down the cause of quite a few service outages.
Regards, Henrik
Sometimes the real world runs interference for Utopia. While in Utopia you want to analyse, find the root cause, and fix everything before proceding, you can't always do that. When an outage of one hour costs your company tens of thousands of dollars, you can't justify withholding a simple bandaid (so long as you don't then ignore the long term fix).
Most everything I do in Hobbit is a custom script. Restarting crashed processes is one of the least of my worries. Although in some rare cases I do just that (short term), with appropriate logging and email to the app developement team. The corporate expense of having the app down is too great to let Utopian ideas prevail.
Most of the automated Hobbit stuff I do is not restarting dead apps (luckily, that is very infrequent around here). It's more mundane. One example is disk space. A full filesystem would shut many things down. Apps should not fill a filesystem, but sometimes they do. So my custom Hobbit scripts first scream and scream about low disk space, even analysing things down to specific subdirectories and fast growing files and doing trend analysis. But if their call is not answered, they start freeing up space from a "private reserve" I have set aside to deal with emergencies. So if we experience a sudden unexpected blowup in a filesystem at 3am, Hobbit keeps things running in production until the appropriate people can look into and diagnose the problem. This may not be Utopian behavior, but it sure is practical at 3am in the morning!
But my vote would be for Hobbit out-of-the-box to NOT attempt automated repair actions. That should be left to the Hobbit administrator. We can write custom monitor scripts or custom alert scripts to add this functionality if it's appropriate for our environments. It's trivial to integrate your own scripting into Hobbit.
I sure wish I worked in Utopia though. The job would be a helluva lot less stressful! :-)
-----Original Message----- From: scottrwalters at gmail.com [mailto:scottrwalters at gmail.com] On Behalf Of Scott Walters Sent: Friday, August 03, 2007 11:15 AM To: hobbit at hswn.dk Subject: Re: [hobbit] Highlights of the 4.3.0 version
I am definitely in the "monitor only" camp. As appealing as "self-healing" may seem, I've seen attempts go horrible wrong too many times. For example, shutting down Oracle for upgrades and then being restarted in the middle of the upgrade. Not good.
I also agree that "self-healing" lends itself to band-aids that avoid root-cause determination. I don't think this requires "baby-sitting," but a commitment to fixing things once. I have also had the displeasure of making permanent band-aids, but I cannot condone it.
All of those "operational" aspects aside, I've convinced myself from a security point of view, corrective action from monitoring is bad-- a clear violation of the separation of duties. You don't want your auditors "cleaning up" the numbers as they go over your books.
You know what's better than your webserver being automatically restarted when it crashes? Your webserver not crashing.
I completely support the absence of corrective actions from monitor triggers. The question I have yet to answer satisfactorily is,"Should the monitoring system perform additional data collection after specific errors?" For example, running a particular "find" command when disk usage increases to try and identify which files are causing the partition to fill.
Scott Walters -PacketPusher
On 8/3/07, Hubbard, Greg L <greg.hubbard at eds.com> wrote:
Well, I use Netcool which has the opposite philosophy -- there is a "process automation" system that watches processes and restarts them if they fail, while also logging restarts. You can configure a "restart" parameter to be anything from 0 (forever) to any number of times. I like to set a reasonable number so persistent errors eventually kill the process, but occasional errors do not. Log files are not overwritten, but are appended and rotated.
But whatever. My view seems to be in the minority -- guess the rest of you don't mind 24x7x365 babysitting.
GLH
-----Original Message----- From: Galen Johnson [mailto:Galen.Johnson at sas.com] Sent: Friday, August 03, 2007 10:18 AM To: hobbit at hswn.dk Subject: RE: [hobbit] Highlights of the 4.3.0 version
DOn't forget...this is the model that Tivoli and HP Openview, and many
other commercial monitoring solutions provide and sell as a feature. From my experience as a sys admin, I've alwys found that automatically
restarting a service if it goes down to be "a bad thing"(TM).
In many solutions, logs get overwritten upon a restart that would be integral to the real resolution and prevention.
=G=
-----Original Message----- From: Tod Hansmann [mailto:thansmann at directpointe.com] Sent: Friday, August 03, 2007 10:40 AM To: hobbit at hswn.dk Subject: RE: [hobbit] Highlights of the 4.3.0 version
In my experience, I have to agree. Hobbit is for monitoring so the information that x is down gets to people who can properly diagnose what is going on, not take generic actions. If generic actions were something that were required for X to function properly, it should be a feature of that software.
Hobbit CAN do some scripting based on alerts, but even that might be a
bit more than a systems administrator wants to hinder himself with.
Tod Hansmann Network Engineer
-----Original Message----- From: Buchan Milne [mailto:bgmilne at staff.telkomsa.net] Sent: Friday, August 03, 2007 12:31 AM To: hobbit at hswn.dk Cc: Hubbard, Greg L Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Tuesday 24 July 2007 22:55:02 Hubbard, Greg L wrote:
Wonder if there is any way to tell a client what it's status is so it can be autonomous? What I mean is this: suppose there was a way
for the Hobbit client to tell the server that service X was now in state Y, and a client-side module could then activate response Z on its own?
I don't like band-aids like this.
"restart because it's down" prevents the real impact of problems being
seen, and provides less motivation for fixing things properly. Instead, you sit with frequent short outages (which may avoid the attention of managers, production managers) which have end-user impact.
I like even less using a monitoring system to do this ...
Regards, Buchan
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On 8/3/07, Haertig, David F (Dave) <haertig at avaya.com> wrote:
Most everything I do in Hobbit is a custom script. Restarting crashed processes is one of the least of my worries. Although in some rare cases I do just that (short term), with appropriate logging and email to the app developement team. The corporate expense of having the app down is too great to let Utopian ideas prevail.
Agreed, though sometimes it's worth the effort for an extra few minutes of downtime to do *some* analysis.
Most of the automated Hobbit stuff I do is not restarting dead apps
(luckily, that is very infrequent around here). It's more mundane. One example is disk space. A full filesystem would shut many things down. Apps should not fill a filesystem, but sometimes they do. So my custom Hobbit scripts first scream and scream about low disk space, even analysing things down to specific subdirectories and fast growing files and doing trend analysis. But if their call is not answered, they start freeing up space from a "private reserve" I have set aside to deal with emergencies. So if we experience a sudden unexpected blowup in a filesystem at 3am, Hobbit keeps things running in production until the appropriate people can look into and diagnose the problem. This may not be Utopian behavior, but it sure is practical at 3am in the morning!
What sort of trend analysis do your scripts perform? We have a few boxes that are notorious for filling up their disk space, and I haven't yet come up with an idea of how to neatly track exactly what it is that keeps filling up the disk.
But my vote would be for Hobbit out-of-the-box to NOT attempt automated
repair actions. That should be left to the Hobbit administrator. We can write custom monitor scripts or custom alert scripts to add this functionality if it's appropriate for our environments. It's trivial to integrate your own scripting into Hobbit.
Due to the demands of some of the other admins, I have implemented a script that does some rudimentary restarting, and even looks at the status of the specific Hobbit alert in question, so that it doesn't try to restart something, if the alert has been disabled (such as for a planned downtime).
It wasn't all that hard to write, and I also would prefer Hobbit NOT have auto-restart logic out of the box.
I sure wish I worked in Utopia though. The job would be a helluva lot
less stressful! :-)
Working in the real world isn't as bad, compared to working the real world where management _thinks_ you actually work in Utopia, and yet still can't spare an extra second of downtime for real-time root cause analysis. ;-)
I try to identify filesystem "space hogs" via custom scripts I wrote a long time ago when using BB. 99% of my custom stuff is done in PERL.
I use 'du -k' to get the size of all directories in the filesystem. I then cut those results down to only the first and second level directories (but you could go as deep as you want). I store the size of each subdirectory in a small "database". I did this ages ago and my code uses PERL's "Storable" module to store the accumulated date into a file (called my "database"). These days I'd just use Hobbit's easily accessed RRD files. I then use PERL's Statistics::Descriptive::least_squares_fit() to calculate the slope and linear correlation coefficient of the "best fit line". This allows me to see how fast each subdirectory is growing/shrinking, and how linear that growth/reduction is. I trigger yellow/red conditions based on rate of growth and predicted fill time at current growth rate, in addition to the standard "95% full = red" test.
The above makes it fairly easy to identify which subdirectory is your problem, which is often times good enough to identify the file/process that is killing you. When that's not, I have a seperate test that tries to identify problem files a different way. BB/Hobbit uses 'top' to identify cpu-hogging processes. Many times you see files hogging space are directly tied to processes hogging cpu (runaway process = runaway file in many cases). 'top' identifies the process(es), then "lsof -p <pid>" is used to identify the files that the suspect process has open. Finding a cpu-hogger that has a filespace-hogger open is usually the holy grail you seek.
As a "repair" action for Hobbit, I squirreled away 2Gb of diskspace in 100Mb chunks for critical filesystems. "dd if=/dev/zero of=/filesystem/DiskSpaceReserve/reserve01 bs=1024 count=102400", then "cp reserve01 reserve02", etc. to build up the reserve. A seperate Hobbit "notification script" is used to simply delete files from this reserve under dire circumstances, after normal email/pager notifications have failed to trigger action by developers/production support people.
My BB/Hobbit custom scripts tend to get quite involved. Probably too much so, but they're fun for me to write!
From: Gary Baluha [mailto:gumby3203 at gmail.com] Sent: Monday, August 06, 2007 7:29 AM To: hobbit at hswn.dk Subject: Re: [hobbit] Highlights of the 4.3.0 version
< ... snip ... >
One example is disk space. A full filesystem would shut many things down. Apps should not fill a filesystem, but sometimes they do. So my custom Hobbit scripts first scream and scream about low disk space, even analysing things down to specific subdirectories and fast growing files and doing trend analysis. But if their call is not answered, they start freeing up space from a "private reserve" I have set aside to deal with emergencies. So if we experience a sudden unexpected blowup in a filesystem at 3am, Hobbit keeps things running in production until the appropriate people can look into and diagnose the problem. This may not be Utopian behavior, but it sure is practical at 3am in the morning!
What sort of trend analysis do your scripts perform? We have a few boxes that are notorious for filling up their disk space, and I haven't yet come up with an idea of how to neatly track exactly what it is that keeps filling up the disk.
< ... snip ...>
On Monday 06 August 2007 21:25:46 Haertig, David F (Dave) wrote:
I try to identify filesystem "space hogs" via custom scripts I wrote a long time ago when using BB. 99% of my custom stuff is done in PERL.
I use 'du -k' to get the size of all directories in the filesystem. I then cut those results down to only the first and second level directories (but you could go as deep as you want). I store the size of each subdirectory in a small "database". I did this ages ago and my code uses PERL's "Storable" module to store the accumulated date into a file (called my "database"). These days I'd just use Hobbit's easily accessed RRD files. I then use PERL's Statistics::Descriptive::least_squares_fit() to calculate the slope and linear correlation coefficient of the "best fit line".
This would be really useful to do on directories monitored with the dir option in client-local.cfg plus DIR option in hobbit-clients, e.g. to be able to specify alerts at specified "time before disk is full".
This allows me to see how fast each subdirectory is growing/shrinking, and how linear that growth/reduction is. I trigger yellow/red conditions based on rate of growth and predicted fill time at current growth rate, in addition to the standard "95% full = red" test.
The above makes it fairly easy to identify which subdirectory is your problem, which is often times good enough to identify the file/process that is killing you. When that's not, I have a seperate test that tries to identify problem files a different way. BB/Hobbit uses 'top' to identify cpu-hogging processes. Many times you see files hogging space are directly tied to processes hogging cpu (runaway process = runaway file in many cases). 'top' identifies the process(es), then "lsof -p <pid>" is used to identify the files that the suspect process has open. Finding a cpu-hogger that has a filespace-hogger open is usually the holy grail you seek.
The "CPU usage by process" graph is the utopian one ...
As a "repair" action for Hobbit, I squirreled away 2Gb of diskspace in 100Mb chunks for critical filesystems. "dd if=/dev/zero of=/filesystem/DiskSpaceReserve/reserve01 bs=1024 count=102400", then "cp reserve01 reserve02", etc. to build up the reserve.
lvextend may be another useful command here ...
Regards, Buchan
Hi All,
Is there a way to filter out hosts/sites based on pagename or just a regular expression? Trying to do an availability report for all sites except one and wondering if there is a way.
Thanks, Jason.
Hi list, I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -Ipwd/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket -lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ]
[ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ]
[ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ]
make: Fatal error: Unknown option -C' *** Error code 1 make: Fatal error: Command failed for target lib-build'
it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient? Please let me know.
Thanks in advance
Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
Make sure the command make is really gmake, or use gmake explicitly. Steve
On 8/9/07, Robert <rgoud at yahoo.com> wrote:
Hi list, I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd /apps/hobbit/bbgen-3.5/build bash-2.05# cd .. bash-2.05# make CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I
pwd/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket -lnsl" make -C lib all Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ] [ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ] [ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ] make: Fatal error: Unknown option-C' *** Error code 1 make: Fatal error: Command failed for targetlib-build'it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient? Please let me know.
Thanks in advance
Robert <rgoud at yahoo.com> wrote: Hi list, I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -Ipwd/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket -lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ]
[ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ]
[ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ]
make: Fatal error: Unknown option -C' *** Error code 1 make: Fatal error: Command failed for target lib-build'
it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient? Please let me know.
Thanks in advance
Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
Pinpoint customers who are looking for what you sell.
Hi list, I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -Ipwd/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket -lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ]
[ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ]
[ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ]
make: Fatal error: Unknown option -C' *** Error code 1 make: Fatal error: Command failed for target lib-build'
it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient? Please let me know.
Thanks in advance
Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
Luggage? GPS? Comic books? Check out fitting gifts for grads at Yahoo! Search.
Robert wrote:
Hi list, I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd /apps/hobbit/bbgen-3.5/build bash-2.05# cd .. bash-2.05# make CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I
pwd/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket -lnsl" make -C lib all Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ] [ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ] [ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ] make: Fatal error: Unknown option-C' *** Error code 1 make: Fatal error: Command failed for targetlib-build'it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient? Please let me know.
You need to use gmake and not Solaris make.
Or you can always use the packages from http://www.blastwave.org/ .
-- -m
Mike, I spent lot of time on that site, I am trying to download CSWhobbit but when I click on it to download it is showing bunch of dependencies and I can't download any of those. I am not sure what I am doing wrong, could you please let me know how to download from there. Thanks in advance
Mike Arnold <hobbit at razorsedge.org> wrote: Robert wrote:
Hi list, I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd /apps/hobbit/bbgen-3.5/build bash-2.05# cd .. bash-2.05# make CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I
pwd/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket -lnsl" make -C lib all Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ] [ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ] [ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ] make: Fatal error: Unknown option-C' *** Error code 1 make: Fatal error: Command failed for targetlib-build'it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient? Please let me know.
You need to use gmake and not Solaris make.
Or you can always use the packages from http://www.blastwave.org/ .
-- -m
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online.
Try sunfreeware.com...
From: Robert [mailto:rgoud at yahoo.com] Sent: Friday, August 10, 2007 12:25 PM To: hobbit at hswn.dk Subject: Re: [hobbit] Hobbit instlalation on Solaris 9
Mike, I spent lot of time on that site, I am trying to download CSWhobbit but when I click on it to download it is showing bunch of dependencies and I can't download any of those. I am not sure what I am doing wrong, could you please let me know how to download from there. Thanks in advance
Mike Arnold <hobbit at razorsedge.org> wrote:
Robert wrote:
Hi list, I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd /apps/hobbit/bbgen-3.5/build bash-2.05# cd .. bash-2.05# make CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I
pwd/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv
-lsocket
-lnsl" make -C lib all Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ] [ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ] [ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ] make: Fatal error: Unknown option
-C' *** Error code 1 make: Fatal error: Command failed for targetlib-build'it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient? Please let me know.
You need to use gmake and not Solaris make.
Or you can always use the packages from http://www.blastwave.org/ .
-- -m
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online. <http://us.rd.yahoo.com/evt=48251/*http:/smallbusiness.yahoo.com/webhost ing/?p=PASSPORTPLUS>
or try a mirror-site ;-)
http://ftp.uni-erlangen.de/pub/mirrors/blastwave.org/unstable/i386/5.9/hobbi...
and additional
CSWchkconfig CSWcommon CSWexpat CSWggettext CSWhobbitc CSWiconv CSWlibpopt SMCpcre
On Fri, 10 Aug 2007, Galen Johnson wrote:
Try sunfreeware.com...
From: Robert [mailto:rgoud at yahoo.com] Sent: Friday, August 10, 2007 12:25 PM To: hobbit at hswn.dk Subject: Re: [hobbit] Hobbit instlalation on Solaris 9
Mike, I spent lot of time on that site, I am trying to download CSWhobbit but when I click on it to download it is showing bunch of dependencies and I can't download any of those. I am not sure what I am doing wrong, could you please let me know how to download from there. Thanks in advance
Mike Arnold <hobbit at razorsedge.org> wrote:
Robert wrote:
Hi list, I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd /apps/hobbit/bbgen-3.5/build bash-2.05# cd .. bash-2.05# make CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I
pwd/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv-lsocket
-lnsl" make -C lib all Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ] [ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ] [ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ] make: Fatal error: Unknown option
-C' *** Error code 1 make: Fatal error: Command failed for targetlib-build'it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient? Please let me know.
You need to use gmake and not Solaris make.
Or you can always use the packages from http://www.blastwave.org/ .
-- -m
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online. <http://us.rd.yahoo.com/evt=48251/*http:/smallbusiness.yahoo.com/webhost ing/?p=PASSPORTPLUS>
Cheers,
Flemming
treibsAND Willy-Brandt-Allee 9 23554 Lübeck
www.treibsand.net www.walli-bleibt.de www.myspace.com/treibsand_luebeck info at treibsand.net
Robert wrote:
Mike, I spent lot of time on that site, I am trying to download CSWhobbit but when I click on it to download it is showing bunch of dependencies and I can't download any of those. I am not sure what I am doing wrong, could you please let me know how to download from there. Thanks in advance
Blastwave's web pages are just informative. To use blastwave you must have pkg-get installed.
HOWTO Use Blastwave http://www.blastwave.org/howto_S8.html
Once you have pkg-get installed, you can install Hobbit like this: pkg-get -i hobbit hobbit_client
Hobbit then lives in /opt/csw/libexec/hobbit .
-- -mike
Install and try "gmake" on your solaris machine. You can get a precompiled binary package from www.sunfreeware.com
From: Robert [mailto:rgoud at yahoo.com] Sent: Thursday, August 09, 2007 8:59 AM To: hobbit at hswn.dk Subject: Re: [hobbit] Filter reports?
Hi list,
I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT
-DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -Ipwd/include"
SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include"
SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket
-lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD
]
[ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S
][ -t ]
[ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro
+=value"... ]
make: Fatal error: Unknown option -C' *** Error code 1 make: Fatal error: Command failed for target lib-build'
it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
Please let me know.
Thanks in advance
Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now <http://us.rd.yahoo.com/evt=48223/*http:/get.games.yahoo.com/proddesc?ga mekey=monopolyherenow> (it's updated for today's economy) at Yahoo! Games.
Robert a écrit :
Hi list, hi Robert, I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd /apps/hobbit/bbgen-3.5/build bash-2.05# cd .. bash-2.05# make CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I
pwd/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket -lnsl" make -C lib all Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ] [ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ] [ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ] make: Fatal error: Unknown option-C' *** Error code 1 make: Fatal error: Command failed for targetlib-build'
are you using gmake or the standard solaris make ?
it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient? Please let me know.
Thanks in advance
Talk about hijacking a thread, and its not even close to anything about Filter reports. Folks, please create a new email with a new subject for your "New" posts. This will make following threads stay on subject.
Thanks Trent
On Thu, 2007-08-09 at 05:59 -0700, Robert wrote:
Hi list, I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd /apps/hobbit/bbgen-3.5/build bash-2.05# cd .. bash-2.05# make CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I
pwd/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket -lnsl" make -C lib all Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ][ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ] [ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ] make: Fatal error: Unknown option
-C' *** Error code 1 make: Fatal error: Command failed for targetlib-build'it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient? Please let me know.
Thanks in advance
Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
I thought it'd gotten a little off topic too :)
Original question I asked:
Hi All,
Is there a way to filter out hosts/sites based on pagename or just a regular expression? Trying to do an availability report for all sites except one and wondering if there is a way.
Thanks, Jason.
-----Original Message----- From: Trent Melcher [mailto:trent.melcher at sitel.com] Sent: 09 August 2007 16:56 To: hobbit at hswn.dk Subject: Re: [hobbit] Filter reports?
Talk about hijacking a thread, and its not even close to anything about Filter reports. Folks, please create a new email with a new subject for your "New" posts. This will make following threads stay on subject.
Thanks Trent
On Thu, 2007-08-09 at 05:59 -0700, Robert wrote:
Hi list, I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd /apps/hobbit/bbgen-3.5/build bash-2.05# cd .. bash-2.05# make CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I
pwd/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket -lnsl" make -C lib all Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ][ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ] [ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ] make: Fatal error: Unknown option
-C' *** Error code 1 make: Fatal error: Command failed for targetlib-build'it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient? Please let me know.
Thanks in advance
Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Thu, Aug 09, 2007 at 12:33:36PM +0100, Jones, Jason (Altrincham) wrote:
Is there a way to filter out hosts/sites based on pagename or just a regular expression? Trying to do an availability report for all sites except one and wondering if there is a way.
Only way I can see is to run the report using a bb-hosts file without the host you want excluded.
Henrik
On Friday 03 August 2007 19:15:27 Scott Walters wrote:
I am definitely in the "monitor only" camp. As appealing as "self-healing" may seem, I've seen attempts go horrible wrong too many times. For example, shutting down Oracle for upgrades and then being restarted in the middle of the upgrade. Not good.
How about the easy example of a web server not responding. Do you restart it ? In the case I am thinking of, no. Since, the reason it is not responding is that the database server it (and another 4 webservers) is waiting for is having problems. Restarting the web server would drop the >1000 existing (working) sessions, causing a full-blown outage, and migrate the problem to the other 4 web servers that sit behind the same load balancer.
I also agree that "self-healing" lends itself to band-aids that avoid root-cause determination.
Or *prevent* the root-cause determination. For example, I had a problem on an LDAP server that appeared once in 2 or 3 weeks. I start it under a debugger, and when next experienced the problem, some online debugging (after taking it out of the pool) with a developer found and fixed the bug within one hour (and allowed me to understand the cause so I could work around it). A restart here would have meant waiting some more and another few outages.
I don't think this requires "baby-sitting," but a commitment to fixing things once. I have also had the displeasure of making permanent band-aids, but I cannot condone it.
We do have some applications that require supervision ... but for them we use daemon-tools or supervise-scripts (a re-implementation of daemon-tools), as these are *much* better at supervision than a monitoring system. If you really need a baby-sitter, the monitoring system isn't the best one ...
All of those "operational" aspects aside, I've convinced myself from a security point of view, corrective action from monitoring is bad-- a clear violation of the separation of duties. You don't want your auditors "cleaning up" the numbers as they go over your books.
You know what's better than your webserver being automatically restarted when it crashes? Your webserver not crashing.
I completely support the absence of corrective actions from monitor triggers. The question I have yet to answer satisfactorily is,"Should the monitoring system perform additional data collection after specific errors?" For example, running a particular "find" command when disk usage increases to try and identify which files are causing the partition to fill.
Or attach a debugger to the hung process and get a backtrace ?
Regards, Buchan
participants (19)
-
bgmilne@staff.telkomsa.net
-
flemming@treibsand.net
-
Galen.Johnson@sas.com
-
greg.hubbard@eds.com
-
gumby3203@gmail.com
-
haertig@avaya.com
-
henrik@hswn.dk
-
hobbit@razorsedge.org
-
JasonAS_Jones@mentor.com
-
kolbjorn.barmen@uninett.no
-
pkc_mls@yahoo.fr
-
rgoud@yahoo.com
-
s_aiello@comcast.net
-
scott@PacketPushers.com
-
sholmes42@mac.com
-
thansmann@directpointe.com
-
Thomas.Kern@hq.doe.gov
-
Tom.Moore@sas.com
-
trent.melcher@sitel.com