The "template calculations" are where you compare the SNMP data you've collected against the various limits you've setup for each host, I suppose ?
To me, it sounds as if this part of the code ought to be mostly CPU-bound. You're not loading a lot of data from disk. Have you tried looking at some vmstat data while this runs ?
Yep, the template calculations involve taking the SNMP data and putting it through various transform functions (converting numeric values to string values, converting bits to bytes, doing regexp substitutions, etc) and then comparing them to the threshold data.
The lions share of the time is spent on the interface tests (if_stat, if_load, if_err, etc), as some switches have SNMP repeater oids with 50+ leaves, and each leaf gets run through the gamut of all these subroutines. This will be true for any type of test (whether it be for layer2 devices or not) that you want to test large repeater-type OIDs on.
If - as I suspect - it is CPU-bound, then splitting up the task on multiple processes running on a single node won't give you any improvement. If you split this onto multiple nodes, it's a different story, of course.
Just some input from a guy who's spent too much time watching how BB spent its time running all of the shell scripts :-)
Arggh. I just realized that you are totally correct. Indeed, when the SNMP data collection process ends and the template calculations begin, the CPU usage of the master devmon process jumps to 99%. Sigh. And here I was looking to take the "easy" way out :)
-Eric