On Thu, Jul 06, 2006 at 10:52:42AM -0400, Sean Hennessey wrote:
Say I have for example
HOST=pdcd20-ic1 PROC sm_server 1 1 RED HOST=pdcd20-ic2 PROC sm_server 2 2 RED
Pdcdd20-ic1 sends its update and it has 5 sm_server process, can you tell me the flow that happens?
Does it take the hostname as sent (pdcd20-ic1) and then look that host up in the clients.cfg file to see it should have only 1 sm_server or does it look at all the PROC directives first and then look at the host?
Mostly it's the second way you describe, but it's actually a two-step process.
hobbitd_client parses the hobbit-clients.cfg file, and stores it in an internal data-structure - really just a linked list with a parsed copy of the rules. I.e. there are the strings or regular expressions to match hostnames/testnames etc against, the thresholds and so on. At this point, there is no relation to any particular host, it's just a list of the rules in hobbit-clients.cfg.
When a client message arrives, the rule-list is scanned for PROC rules that match the particulars for this host, and each line in the "ps" output is tested against the process-name that the rule is about. While doing this, it merely counts how many matches are found in the "ps" output for each of the relevant PROC rules. So one line from the ps output can add a count to several PROC rules, not just one.
Finally, all of the relevant PROC rules are examined, and the actual process count is compared to the thresholds listed in the rule. Based on this, the "procs" status color is decided, and the "procs" status text is generated with info about the rules that failed or succeeded.
So in your example, it would first count the number of "sm_server" processes, and both of the PROC rules would get a count of 5 active processes matching the rule. So when the "procs" status is generated, you should actually see two rules listed as "red" - one saying that hobbit found 5 sm_server processes while expecting just 1, and the other saying it found 5 sm_server processes while expecting only 2.
So the problem with your proposal is that by the time the final step happens - when hobbit goes through all of the PROC rules to see which ones has the right count of active processes - by then any association with a specific process name has been lost. So there is no way that Hobbit knows that one rule comes before another, and relates to the same process. In your example configuration, they both happen to look for an "sm_server" process; we *could* choose to make this special setup for rules that look for the exact same processname. But what if you wrote the rules as
HOST=bla PROC %[st]m_server 1 1 HOST=bla PROC sm_server 2 2
and there is 1 sm_server processes active ? The two patterns are different, so how can hobbit tell that it should use the first one and ignore the second one ? And what should it do if there is a tm_server process running ? Use only the first rule and ignore the second?
Great product by the way.
After all :-))
Thanks, Henrik