xymon iostat causing problems on loaded box, how to mitigate?
We're seeing problems with xymon iostat hanging on heavily loaded, massively large database servers. These boxes are on Oracle RAC clusters with EMC storage devices attached. iostat -x shows 16572 lines. When the box starts to have problems for *other* reasons, all of the iostat commands start hanging or taking forever, further bogging the system down. And making xymon look really bad ...
I'm wondering - what does xymon do with all this information?
How much of it can I turn off without confusing the server?
(part of a PS from 12:13, showing some iostats that have been running for over half an hour)
ps -ef|grep iostat
xymon 102 98 2 11:55:47 ? 7:18 iostat -dxsrP 300 2 xymon 16934 16931 0 12:10:00 ? 0:04 iostat -c 300 2 xymon 16911 1 0 11:51:02 ? 0:00 sh -c iostat -dxsrP 300 2 1>/export/home/xymon/client/tmp/hobbit_iostatdisk.db6 xymon 566 1 0 12:05:38 ? 0:00 sh -c iostat -dxsrP 300 2 1>/export/home/xymon/client/tmp/hobbit_iostatdisk.db6 xymon 16931 1 0 12:10:00 ? 0:00 sh -c iostat -c 300 2 1>/export/home/xymon/client/tmp/hobbit_iostatcpu.db6.bo3. xymon 16932 1 0 12:10:00 ? 0:00 sh -c iostat -dxsrP 300 2 1>/export/home/xymon/client/tmp/hobbit_iostatdisk.db6 xymon 15575 15566 2 12:00:47 ? 5:53 iostat -dxsrP 300 2 xymon 15566 1 0 12:00:47 ? 0:00 sh -c iostat -dxsrP 300 2 1>/export/home/xymon/client/tmp/hobbit_iostatdisk.db6 xymon 1349 1344 2 11:45:55 ? 9:34 iostat -dxsrP 300 2 xymon 17276 17271 2 11:40:55 ? 10:55 iostat -dxsrP 300 2 xymon 16935 16932 2 12:10:00 ? 2:57 iostat -dxsrP 300 2 xymon 17271 1 0 11:40:55 ? 0:00 sh -c iostat -dxsrP 300 2 1>/export/home/xymon/client/tmp/hobbit_iostatdisk.db6 xymon 1344 1 0 11:45:55 ? 0:00 sh -c iostat -dxsrP 300 2 1>/export/home/xymon/client/tmp/hobbit_iostatdisk.db6 xymon 572 566 2 12:05:38 ? 5:13 iostat -dxsrP 300 2 xymon 16914 16911 2 11:51:02 ? 8:24 iostat -dxsrP 300 2 xymon 98 1 0 11:55:47 ? 0:00 sh -c iostat -dxsrP 300 2 1>/export/home/xymon/client/tmp/hobbit_iostatdisk.db6 <etc>
On 12-09-2011 20:04, Elizabeth Schwartz wrote:
We're seeing problems with xymon iostat hanging on heavily loaded, massively large database servers. These boxes are on Oracle RAC clusters with EMC storage devices attached. iostat -x shows 16572 lines. When the box starts to have problems for *other* reasons, all of the iostat commands start hanging or taking forever, further bogging the system down. And making xymon look really bad ...
I'm wondering - what does xymon do with all this information?
How much of it can I turn off without confusing the server?
xymon doesn't use the iostat information, it is just picked up together with the other client data. So if it is causing you problems, by all means do remove that from the client-side script.
I have similar problems on some Windows boxes, where the various Windows clients' scanning the eventlog causes a high load on the server - I have no qualms dropping the event log monitoring in that case. Monitoring systems that kill the server they are monitoring is not a good thing, IMHO.
Regards, Henrik
Great thanks! I took out the -dP flags before I got your answer, will take out the whole thing if it still bogs down when the server is in pain.
In general people here are very happy with xymon! It is a huge step forward from Big Brother which we were using before.
thanks again for all your help Betsy
participants (2)
-
betsy.schwartz@gmail.com
-
henrik@hswn.dk