11 Oct
2011
11 Oct
'11
5:56 a.m.
On Thu, Oct 6, 2011 at 7:37 AM, Henrik Størner <henrik at hswn.dk> wrote:
- Have you looked at the vmstat1 graphs for these systems ? How is the "I/O wait" on them ? Some types of I/O on Linux systems can cause quite a slow-down; deleting large files on ext2 or ext3 systems could be quite time-consuming and cause the whole system to really stall. Also doing things that touch a lot of files - a large find, or grep'ing through a large number of files, especially if you don't mount filesystems with the "noatime" option - can cause a lot of I/O that slows down filesystem operations.
I've seen this happen during log rotation and compression, shown by "sar -d" (I recommend installing sar if you haven't already). The I/O contention during removal of a big file after compression is sufficient to cause any filesystem operation to block for a loooong time. For me, this causes sufficient back-pressure that syslog-ng starts dropping UDP packets while it waits for the logfile to become writeable.
To prove that it's not a Xymon problem, why not create a cron task that does the same thing, but logs how long it takes. If the log shows delays, then it's got nothing to do with Xymon. Perhaps create /etc/cron.d/touchtest with the following:
- root time touch /path/to/file >> /tmp/touchlog 2>&1 50 23 * * * root cp /dev/null /tmp/touchlog
Cheers Jeremy