I've been setting up Xymon since I heard about it a couple of days ago. So far, its fantastic. Unfortunately, after a UPS malfunction that caused a whole room to restart, I have an inexplicable alarm. Take a look at this red alarm:
Memory Used Total Percentage Physical 4294964640M 4084M 4294967231% Swap 0M 4096M 0%
This is from a FreeBSD 7.x box that has 8GB of physical RAM and 4GB of swap. According to "top", the physical memory is still about 6.75GB free.
I've restarted the daemon on the host in question as well as on Xymon's GUI/web server. I've even tried a bin/bb 127.0.0.1 "drop HOSTNAME memory" just to see if it would help.
I'm not even sure where to start on this. It was working well for a day or so before going bad like this. My best guess right now is some kind of "wrap-around" problem with an integer or something like that causing bad data in memory.
Suggestions?
Thanks in advance, Jaime
P.S. - The alarm is at the following URL, if that helps: http://cns.cairodurham.org/hobbit-cgi/bb-hostsvc.sh?HOST=cerberus.cairodurha...
-- Network Administrator Cairo-Durham Central School District http://cns.cairodurham.org
On Sat, December 19, 2009 23:47, Jaime Kikpole wrote:
I've been setting up Xymon since I heard about it a couple of days ago. So far, its fantastic. Unfortunately, after a UPS malfunction that caused a whole room to restart, I have an inexplicable alarm. Take a look at this red alarm:
Memory Used Total Percentage Physical 4294964640M 4084M 4294967231% Swap 0M 4096M 0%
This is from a FreeBSD 7.x box that has 8GB of physical RAM and 4GB of swap. According to "top", the physical memory is still about 6.75GB free.
I've restarted the daemon on the host in question as well as on Xymon's GUI/web server. I've even tried a bin/bb 127.0.0.1 "drop HOSTNAME memory" just to see if it would help.
I'm not even sure where to start on this. It was working well for a day or so before going bad like this. My best guess right now is some kind of "wrap-around" problem with an integer or something like that causing bad data in memory.
Suggestions?
I have not touched BSD in about two years, so just a stab with a rusty fork. My gut feel is same as yours, integer wrap or memory mismap sort of thing.
How is this host maintained WRT kernel and ports? Any chance the kernel or related components have been updated since the last previous reboot and/or since the Xymon build, and that this reboot loaded the changes? If you're using freebsd-upgrade, could you be in the stage 1 reboot after a kernel update now, i.e. needing to do another "freebsd-update install" to complete? I'd look at that, and also at possibly rebuilding/installing Xymon.
On Sunday, December 20, 2009, Xymon User in Richmond <hobbit at epperson.homelinux.net> wrote:
Any chance the kernel or related components have been updated since the last previous reboot and/or since the Xymon build, and that this reboot loaded the changes?
Not a bad question, but no. There have been no changes in the kernel or OS for a little while now. In fact I am hoping to have a chance to do an update in about two weeks.
The system did reboot unexpectedly, though. I wouldn't have expected that to have an effect. What do you think?
I tried a "controlled" restate just now via shutdown -r now. After giving the systema few minutes to talk to itself, it is still reporting strangely high numbers.
Would it make sense to pkg_delete the Xymon daemon on the observed host and reinstall it? Or could that make things worse?
Thanks, Jaime
-- Network Administrator Cairo-Durham Central School District http://cns.cairodurham.org
On Sun, Dec 20, 2009 at 9:46 AM, Jaime Kikpole <jkikpole at cairodurham.org>wrote:
On Sunday, December 20, 2009, Xymon User in Richmond <hobbit at epperson.homelinux.net> wrote:
Any chance the kernel or related components have been updated since the last previous reboot and/or since the Xymon build, and that this reboot loaded the changes?
Not a bad question, but no. There have been no changes in the kernel or OS for a little while now. In fact I am hoping to have a chance to do an update in about two weeks.
The system did reboot unexpectedly, though. I wouldn't have expected that to have an effect. What do you think?
I tried a "controlled" restate just now via shutdown -r now. After giving the systema few minutes to talk to itself, it is still reporting strangely high numbers.
Would it make sense to pkg_delete the Xymon daemon on the observed host and reinstall it? Or could that make things worse?
If you click through the "client data available" link, you'll see this near the top:
[meminfo]
Total:4084
Free:6920
[swapinfo]
Device 1K-blocks Used Avail Capacity
/dev/da0s1b 4194304 0 4194304 0%
That "Total:4084" is supposed to be the total physical memory in the system, if I'm reading the freebsd-meminfo.c source correctly. If you have the xymon source, that's under the "client" directory.
If that system is supposed to have 8G of memory, I think the kernel may not be seeing half of it...
Ralph Mitchell
On Sun, Dec 20, 2009 at 10:27 AM, Ralph Mitchell <ralphmitchell at gmail.com> wrote:
That "Total:4084" is supposed to be the total physical memory in the system, if I'm reading the freebsd-meminfo.c source correctly. If you have the xymon source, that's under the "client" directory.
Thanks. I'm going to check on that now.
If that system is supposed to have 8G of memory, I think the kernel may not be seeing half of it...
That makes some sense. It doesn't explain why it thinks that its using about 4 billion percent of capacity, though. That would be 40,000,000 times capacity. Is there someplace that the unit of measurement is set?
Jaime
-- Network Administrator Cairo-Durham Central School District http://cns.cairodurham.org
On Sun, December 20, 2009 13:43, Jaime Kikpole wrote:
On Sun, Dec 20, 2009 at 10:27 AM, Ralph Mitchell <ralphmitchell at gmail.com> wrote:
That "Total:4084" is supposed to be the total physical memory in the system, if I'm reading the freebsd-meminfo.c source correctly. If you have the xymon source, that's under the "client" directory.
Thanks. I'm going to check on that now.
If that system is supposed to have 8G of memory, I think the kernel may not be seeing half of it...
That makes some sense. It doesn't explain why it thinks that its using about 4 billion percent of capacity, though. That would be 40,000,000 times capacity. Is there someplace that the unit of measurement is set?
Can't get my head around the math on a Sunday afternoon, but if it's trying to do calculations using what it can see is used and what it's been told is the total available, which may be less, the results would be some kind of hosed.
On Sun, Dec 20, 2009 at 10:27 AM, Ralph Mitchell <ralphmitchell at gmail.com> wrote:
If that system is supposed to have 8G of memory, I think the kernel may not be seeing half of it...
I just checked at the command line:
cerberus# dmesg | grep memory real memory = 9126805504 (8704 MB) avail memory = 8403456000 (8014 MB) cerberus# sysctl -a | grep -e '^hw\.' | grep 'mem: ' hw.physmem: 4282679296 hw.usermem: 4162433024 hw.realmem: 536870912
You may be on to something.
Thanks. I'll check this out in the kernel options, etc.
Jaime
-- Network Administrator Cairo-Durham Central School District http://cns.cairodurham.org
participants (3)
-
hobbit@epperson.homelinux.net
-
jkikpole@cairodurham.org
-
ralphmitchell@gmail.com