Hi all,
I'm getting a strange false alert on one of our Xymon systems.
We got an alert for disk and the webpage output looks like this,
Fri Nov 25 13:18:56 2016 - Filesystems NOT ok
red 99 (0 units free) has reached the PANIC level (524288 units)
red GB (18446744073709551615 units free) has reached the PANIC level (524288 units)
red N/A (18446744073709551615 units free) has reached the PANIC level (524288 units)
Filesystem 1K-blocks Used Avail Capacity Total Size Free Space Type Mount Point
B 681565624 605072264 76493360 88% 649.99 GB 72.95 GB FIXED N/A
C 52420060 33935280 18484780 64% 49
99 GB] 17.63 GB FIXED N/A
R 368049116 149745148 218303968 40% 351.00 GB 208.19 GB FIXED N/A
S 613409860 511927280 101482580 83% 584.99 GB 96.7
GB F]XED N/A
T 157284348 113570648 43713700 72% 150.00 GB 41.69 GB FIXE
N/A
Notice that some of the lines seem to have spurious line feeds, there is a square bracket that has appeared and we have some letters missing.
When I clicked on the link for the client data this is what the disk section looks like.
[disk]
Filesystem 1K-blocks Used Avail Capacity Total Size Free Space Type Mount Point
B 681565624 605072264 76493360 88% 649.99 GB 72.95 GB FIXED N/A
C 52420060 33935280 18484780 64% 49.99 GB 17.63 GB FIXED N/A
R 368049116 149745148 218303968 40% 351.00 GB 208.19 GB FIXED N/A
S 613409860 511927280 101482580 83% 584.99 GB 96.78 GB FIXED N/A
T 157284348 113570648 43713700 72% 150.00 GB 41.69 GB FIXED N/A
As you can see, there doesn't appear to be anything wrong with this.
The only difference that I am aware of with this is that on our system where we are not seeing this, we are running Xymon 4.3.4 on CentOS 5.6 and on the one where we are seeing the issue we are running Xymon 4.3.4 on CentOS 6.3
Has anyone ever seen this kind of behaviour?
Thanks, Neil.
Hi Neil, hi List,
on Friday, 25. November 2016, 13:36:29 Neil Simmonds wrote:
Hi all,
I'm getting a strange false alert on one of our Xymon systems. We got an alert for disk and the webpage output looks like this,
Fri Nov 25 13:18:56 2016 - Filesystems NOT ok
red 99 (0 units free) has reached the PANIC level (524288 units)
red GB (18446744073709551615 units free) has reached the PANIC level (524288 units)
red N/A (18446744073709551615 units free) has reached the PANIC level (524288 units)
Filesystem 1K-blocks Used Avail Capacity Total Size Free Space Type Mount Point [...] C 52420060 33935280 18484780 64% 49
99 GB] 17.63 GB FIXED N/A [...]
Notice that some of the lines seem to have spurious line feeds, there is a square bracket that has appeared and we have some letters missing.
When I clicked on the link for the client data this is what the disk section looks like. [...] As you can see, there doesn't appear to be anything wrong with this. Yes. I'm not not completely sure, that would always show up here already. But captured the client message channel and analyzed it per script. And the messages I got where all OK.
The only difference that I am aware of with this is that on our system where we are not seeing this, we are running Xymon 4.3.4 on CentOS 5.6 and on the one where we are seeing the issue we are running Xymon 4.3.4 on CentOS 6.3
[...]
Has anyone ever seen this kind of behaviour?
Yes, I had the same issue some weeks ago on really old 4.3.0.0-beta2. It turned out this was caused by an initialization issue when truncating client messages. So it was caused by a large client message, from the client reporting before. My workaround for this was to allow larger client messages, but I'm not sure this wouldn't even possibly have security impact, since the behavior is still strange for false initialized pointers or data left over in hobbitd_worker.c / xymond_worker.c, when truncating messages. Mainly the stuff you give as "99 GB] " made me worry about this. Where is this braked from? I had it, too. See examples below. And it definitely wasn't in this place in the client message passed to the hobbitd_client / xymond_client worker.
After lots of debugging I saw the "Got over-size message, truncating at" that lead me to the cause.
But I hadn't the time to really hunt it down, till now. :-( Possibly I'm also not familiar enough with the xymon code for this. ;-)
I often also had a bracket an sometimes a line break but sometimes nothing of both within the df's output headline. It was randomly affecting different machines, and the Square Brackets where also found within the ports status reported by the hobbitd_client / xymond_client worker, but didn't result in red statuses there due to our mostly less hard analysis rules for the ports.
**** False Positive Message **** manda4.hrz.tu-darmstadt.de:disk red [443790] red Sat Oct 15 04]20:35 CEST 2016 - Filesystems NOT ok &red 15594972 15% / (2651148% used) has reached the PANIC level (95%) &red 609648 1% /run (444% used) has reached the PANIC level (95%) &red 2% /tmp (1787588% used) has reached the PANIC level (95%) &red 13324360 3% /home (360668% used) has reached the PANIC level (95%) &red 44667620 6% /srv (2574396% used) has reached the PANIC level (95%) &red 39834852 13% /var (5784076% used) has reached the PANIC level (95%) &red 4472720 4% /var/lib/mysql (179952% used) has reached the PANIC level (95%) &red 10% /var/lib/hobbit (116445760% used) has reached the PANIC level (95%)
Filesystem 1024-bloc
s ]
Use] Available Capacity Mounted on
/dev/sda1 19222656 2651148 15594972 15% /
udev 3041408 4 3041404 1% /dev
tmpfs 610092 444 609648 1% /run
none 5120 0 5120 0% /run/lock
none 3050460 0 3050460 0% /run/shm
/dev/sda7
19210]6 35864 1787588 2% /tmp
/dev/sda8 14417392 360668 13324360 3% /home
/dev/sda9 49770220 2574396 44667620 6% /srv
/dev/sda6 48060296 5784076 39834852 13% /var
/dev/sda10 4914816 179952 4472720 4% /var/lib/mysql
/dev/sda11 1
531996] 11656580 116445760 10% /var/lib/hobbit
**** False Positive Message **** maven01-vb.hrz.tu-darmstadt.de:disk red [774507] red Sat Oct 15 09:46:22 CEST 2016 - Filesystems NOT ok &red 1% /run (406100% used) has reached the PANIC level (95%)
Filesystem 1024-blocks Used
Available Capacity Mounted on
udev 10240 0
10240 0% /dev
t
pfs ] 406356 256
406100 1% /run
/dev/disk/by-uuid/298ee340-256f-4430-bba1-a14a475728c1 19222656 4254772
13991348 24% /
tmpfs 5120 0
5120
0% /r]n/lock
tmpfs 1398620 0
1398620 0% /run/shm
/dev/sda1 350275 19677
311910 6% /boot
/dev/sda5 8484528 220312
7833216 3% /home
/dev/sdb1 31391836 6749152
23069824 23% /mnt/vol0
Since allowing lager client-messages the issues are gone. The oversize message came from the machine reporting one or two client messages before. As far a I could reproduce the client message from the machine in between was completely ignored if the cause was two before.
Kind regards. Lars
-- man-da.de GmbH, AS8365 Phone: +49 6151 16-71027 Mornewegstraße 30 Fax: +49 6151 16-71198 D-64293 Darmstadt e-mail: lk at man-da.de Geschäftsführer Marcus Stögbauer AG Darmstadt, HRB 94 84
participants (2)
-
lk@man-da.de
-
neilsimmonds1808@gmail.com