On 7/21/07, Henrik Stoerner <henrik at hswn.dk> wrote:
In another thread, someone asked about what new features are planned for version 4.3.0. I've summarized them below;
Great to see the summary, these features look great. I'd like to request more RRDs and reports about the monitoring system and the servers/services monitored. For example:
I think the following could be "gauge" metrics:
Number of devices monitored Number of services monitored Number of host.service in green state Number of host.service in yellow state Number of host.service in red state Number of host.service in XXX state
I am thinking these could be done by creating counters within hobbit (since boot):
Number of state changes Number of state changes per server Number of state changes per service Number of notifications sent
I think the above metrics could help create reports over time periods for review to help get to "management by facts" vs. "management by feeling." Most admins that pay attention to their install will "know", but its different when you can "prove." Plus, when improvements are made, it's nice to see it.
I am also thinking we could try and apply some Six Sigma terminology and methodology to hobbit which may have value. Six Sigma keys on statistics and defects. Six Sigma refers to having production quality such that you only see 3.4 defects per million. Granted we are not "producing" a physical item, but I am thinking that a defect could be considered a purple/yellow/red state. With counters I suggested above, we could to apply various statistical measures (control charts, pareto charts, etc.) and see what makes sense or has value for monitoring.
The goal is to improve consistency and reduce variance.
If you like, I could draft up some graphs and reports I'd like to see. My above description might be hard to visualize. I definitely think hobbit could benefit from internal counters, similarly to how on OS keeps tracks of context switches and the like.
Scott