Xymon performance scaling problems for Solaris zones?
Hi, all...
We've been running Xymon (4.3.3 -- yes, old, old, old...) very successfully for the last several years in our environment, but have run into a performance snag recently.
We run Xymon on all our Solaris zones -- up to 10 zones + the global zone, at a time. Recently we've started configuring new Oracle LDOMS (64 x SPARC T5-8 processors / LDOM) with > 20 Solaris zones / LDOM and we're seeing that after about 10 zones, Xymon starts to hog CPU in a non-linear way, making the zones unusable.
Specifically, it's not any of the "Xymon" daemons, but it's all the concurrent "vmstat" and "iostats" that are racking up the CPU cycles.
We understand we have several options available including instantiating processor sets and pools, ulimits, etc, but we wanted to know if:
a) Anyone else has had to navigate this problem with such a large number of concurrent *stat commands
b) Is it possible / advisable to monitor zones via Xymon using only a single instance of Xymon running in the global zone, or possibly a representative local zone?
The LDOMs all run Solaris 11, FWIW.
TIA for any tips!
david
~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~ David Mills Systems Administrator Northrop Grumman (512) 873-6665
Mills, David (IS) wrote:
Hi, all...
We've been running Xymon (4.3.3 -- yes, old, old, old...) very successfully for the last several years in our environment, but have run into a performance snag recently.
We run Xymon on all our Solaris zones -- up to 10 zones + the global zone, at a time. Recently we've started configuring new Oracle LDOMS (64 x SPARC T5-8 processors / LDOM) with > 20 Solaris zones / LDOM and we're seeing that after about 10 zones, Xymon starts to hog CPU in a non-linear way, making the zones unusable.
Specifically, it's not any of the "Xymon" daemons, but it's all the concurrent "vmstat" and "iostats" that are racking up the CPU cycles.
We understand we have several options available including instantiating processor sets and pools, ulimits, etc, but we wanted to know if:
a) Anyone else has had to navigate this problem with such a large number of concurrent *stat commands
b) Is it possible / advisable to monitor zones via Xymon using only a single instance of Xymon running in the global zone, or possibly a representative local zone?
The LDOMs all run Solaris 11, FWIW.
TIA for any tips!
david
~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~ David Mills Systems Administrator Northrop Grumman (512) 873-6665
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
Dont know if this is going to help much, but we have a bunch of T4s running up to 22 LDOMs, each has Xymon (4.3 series) installed and the primary shows a load average peaking at 0.2 over the last 48 hours. We also have T4s running zones, but the largest number of zones configured is 5, again, even a non-production global zone (where theoretically there is no application load at this time) load average sits around 1.0.
Andy
David,
because Solaris zones shares the same filesystem we usually do all this disk performance tests only at the global zone. It is not useful (and will create a lot of additional IO traffic) to test the same disk group from every zone and global, because the results will be the same of course, and then you have your observed overhead.
Also the disk usage is only checked at the global zone to avoid double alerts (of course you can disable all zone filesystem usage checks at global zone and activate the local zone checks at every zone individually, but that is a lot of more work to do, and does not work instantly if you create new zones).
Also all health checks and the like we do only at global zone...
Norbert
From: "Mills, David (IS)" <David.Mills at ngc.com> To: "xymon at xymon.com" <xymon at xymon.com> Date: 06/24/2014 07:37 PM Subject: [Xymon] Xymon performance scaling problems for Solaris zones? Sent by: "Xymon" <xymon-bounces at xymon.com>
Hi, all...
We've been running Xymon (4.3.3 -- yes, old, old, old...) very successfully for the last several years in our environment, but have run into a performance snag recently.
We run Xymon on all our Solaris zones -- up to 10 zones + the global zone, at a time. Recently we've started configuring new Oracle LDOMS (64 x SPARC T5-8 processors / LDOM) with > 20 Solaris zones / LDOM and we're seeing that after about 10 zones, Xymon starts to hog CPU in a non-linear way, making the zones unusable.
Specifically, it's not any of the "Xymon" daemons, but it's all the concurrent "vmstat" and "iostats" that are racking up the CPU cycles.
We understand we have several options available including instantiating processor sets and pools, ulimits, etc, but we wanted to know if:
a) Anyone else has had to navigate this problem with such a large number of concurrent *stat commands
b) Is it possible / advisable to monitor zones via Xymon using only a single instance of Xymon running in the global zone, or possibly a representative local zone?
The LDOMs all run Solaris 11, FWIW.
TIA for any tips!
david
~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~ David Mills Systems Administrator Northrop Grumman (512) 873-6665_______________________________________________ Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
participants (3)
-
abs@shadymint.com
-
David.Mills@ngc.com
-
norbert.kriegenburg@de.ibm.com