On Fri, Aug 03, 2007 at 01:15:27PM -0400, Scott Walters wrote:
I am definitely in the "monitor only" camp.
Me too. For those who feel differently, Hobbit does provide the necessary hooks so you can trigger actions from some status going red; either through alert scripts, or from the bb "query" command which others have mentioned. In fact, I implemented the "query" feature because I needed it to setup such an automated recovery for one of our customers at work.
All of those "operational" aspects aside, I've convinced myself from a security point of view, corrective action from monitoring is bad-- a clear violation of the separation of duties. You don't want your auditors "cleaning up" the numbers as they go over your books.
Good point.
The question I have yet to answer satisfactorily is,"Should the monitoring system perform additional data collection after specific errors?" For example, running a particular "find" command when disk usage increases to try and identify which files are causing the partition to fill.
It can be very useful at times, especially when you have to do a "root cause analysis" to explain why some service was down at 2 AM in the morning - and the problem was fixed by a 2nd-level technician who just rebooted the box. That's why I added the feature that Hobbit saves the latest client-data report when a status goes yellow or red. It has helped me track down the cause of quite a few service outages.
Regards, Henrik