Patch not done yet? was RE: rrd-data.log
Hi Henrik: Below are snippets from rrd that are still causing the "Duplicate Error" on my end, even after applying the patch. In the cases where there's netstat and ifstat data shown together, I had to include both because those chunks of data came out right at the time the duplicate error appeared. Too hard to time/see which of those 2 chunks of data cause the problem. In other cases, only ifstat data caused the problem. Where I have large chunks of whitespace- that separates "instances" of the duplicate error occurring. =========== @@data#348|1154533834.591188|192.168.224.202||wolf13.hmgc.mcw.edu|netsta t data wolf13,hmgc,mcw,edu.netstat linux Ip: 56731 total packets received 0 forwarded 0 incoming packets discarded 56067 incoming packets delivered 65344 requests sent out Icmp: 648 ICMP messages received 0 input ICMP message failed. ICMP input histogram: destination unreachable: 89 echo requests: 559 648 ICMP messages sent 0 ICMP messages failed ICMP output histogram: destination unreachable: 89 echo replies: 559 Tcp: 4630 active connections openings 1 passive connection openings 1 failed connection attempts 0 connection resets received 0 connections established 46150 segments received 54675 segments send out 85 segments retransmited 0 bad segments received. 1 resets sent Udp: 9933 packets received 0 packets to unknown port received. 0 packet receive errors 10021 packets sent TcpExt: ArpFilter: 0 4616 TCP sockets finished time wait in fast timer 908 delayed acks sent 557 packets directly queued to recvmsg prequeue. 507580 packets directly received from backlog 554 packets directly received from prequeue 19157 packets header predicted 426 packets header predicted and directly queued to user TCPPureAcks: 7526 TCPHPAcks: 9608 TCPRenoRecovery: 0 TCPSackRecovery: 0 TCPSACKReneging: 0 TCPFACKReorder: 0 TCPSACKReorder: 0 TCPRenoReorder: 0 TCPTSReorder: 0 TCPFullUndo: 0 TCPPartialUndo: 0 TCPDSACKUndo: 0 TCPLossUndo: 35 TCPLoss: 0 TCPLostRetransmit: 0 TCPRenoFailures: 0 TCPSackFailures: 0 TCPLossFailures: 0 TCPFastRetrans: 0 TCPForwardRetrans: 0 TCPSlowStartRetrans: 0 TCPTimeouts: 85 TCPRenoRecoveryFail: 0 TCPSackRecoveryFail: 0 TCPSchedulerFailed: 0 TCPRcvCollapsed: 0 TCPDSACKOldSent: 0 TCPDSACKOfoSent: 0 TCPDSACKRecv: 0 TCPDSACKOfoRecv: 0 TCPAbortOnSyn: 0 TCPAbortOnData: 0 TCPAbortOnClose: 0 TCPAbortOnMemory: 0 TCPAbortOnTimeout: 0 TCPAbortOnLinger: 0 TCPAbortFailed: 0 TCPMemoryPressures: 0 @@ @@data#349|1154533834.591337|192.168.224.202||wolf13.hmgc.mcw.edu|ifstat data wolf13,hmgc,mcw,edu.ifstat linux eth1 Link encap:Ethernet HWaddr 00:30:6E:F3:0B:46 inet addr:192.168.96.113 Bcast:192.168.96.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:72022 errors:0 dropped:0 overruns:0 frame:0 TX packets:71139 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:21476056 (20.4 Mb) TX bytes:17076949 (16.2 Mb) Interrupt:56 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:189 errors:0 dropped:0 overruns:0 frame:0 TX packets:189 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:16228 (15.8 Kb) TX bytes:16228 (15.8 Kb) @@ @@data#507|1154533965.074399|192.168.224.202||bc1s2.phys.mcw.edu|ifstat data bc1s2,phys,mcw,edu.ifstat linux eth1 Link encap:Ethernet HWaddr 00:0D:60:1E:0E:DD inet addr:192.168.224.111 Bcast:192.168.224.255 Mask:255.255.255.0 inet6 addr: fe80::20d:60ff:fe1e:edd/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:43166350 errors:0 dropped:0 overruns:0 frame:0 TX packets:6166704 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:14398847028 (13731.8 Mb) TX bytes:2587305933 (2467.4 Mb) Interrupt:45 Memory:c0010000-c0020000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:104 errors:0 dropped:0 overruns:0 frame:0 TX packets:104 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:9752 (9.5 Kb) TX bytes:9752 (9.5 Kb) @@ @@data#740|1154534133.480180|192.168.224.202||dunn.hmgc.mcw.edu|netstat data dunn,hmgc,mcw,edu.netstat linux Ip: 9154191 total packets received 0 forwarded 0 incoming packets discarded 8170997 incoming packets delivered 15226980 requests sent out Icmp: 8961 ICMP messages received 9 input ICMP message failed. ICMP input histogram: destination unreachable: 16 echo requests: 8945 13860 ICMP messages sent 0 ICMP messages failed ICMP output histogram: destination unreachable: 4915 echo replies: 8945 Tcp: 33598 active connections openings 15245 passive connection openings 4 failed connection attempts 106 connection resets received 10 connections established 7841472 segments received 15058779 segments send out 412 segments retransmited 0 bad segments received. 2935 resets sent Udp: 315663 packets received 4901 packets to unknown port received. 0 packet receive errors 154341 packets sent TcpExt: 7 resets received for embryonic SYN_RECV sockets ArpFilter: 0 37618 TCP sockets finished time wait in fast timer 2 packets rejects in established connections because of timestamp 12809 delayed acks sent 2 delayed acks further delayed because of locked socket Quick ack mode was activated 1930 times 144921 packets directly queued to recvmsg prequeue. 7524 packets directly received from backlog 40929271 packets directly received from prequeue 47328 packets header predicted 135097 packets header predicted and directly queued to user TCPPureAcks: 5602771 TCPHPAcks: 2027416 TCPRenoRecovery: 47 TCPSackRecovery: 19 TCPSACKReneging: 0 TCPFACKReorder: 0 TCPSACKReorder: 0 TCPRenoReorder: 0 TCPTSReorder: 0 TCPFullUndo: 0 TCPPartialUndo: 0 TCPDSACKUndo: 0 TCPLossUndo: 39 TCPLoss: 55 TCPLostRetransmit: 0 TCPRenoFailures: 2 TCPSackFailures: 1 TCPLossFailures: 1 TCPFastRetrans: 166 TCPForwardRetrans: 8 TCPSlowStartRetrans: 35 TCPTimeouts: 151 TCPRenoRecoveryFail: 9 TCPSackRecoveryFail: 1 TCPSchedulerFailed: 0 TCPRcvCollapsed: 0 TCPDSACKOldSent: 1 TCPDSACKOfoSent: 0 TCPDSACKRecv: 0 TCPDSACKOfoRecv: 0 TCPAbortOnSyn: 0 TCPAbortOnData: 948 TCPAbortOnClose: 14 TCPAbortOnMemory: 0 TCPAbortOnTimeout: 4 TCPAbortOnLinger: 0 TCPAbortFailed: 0 TCPMemoryPressures: 0 @@ @@data#741|1154534133.480925|192.168.224.202||dunn.hmgc.mcw.edu|ifstat data dunn,hmgc,mcw,edu.ifstat linux eth0 Link encap:Ethernet HWaddr 00:09:3D:13:DC:AB inet addr:192.168.224.105 Bcast:192.168.224.255 Mask:255.255.255.0 inet6 addr: fe80::209:3dff:fe13:dcab/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:39290006 errors:0 dropped:0 overruns:0 frame:0 TX packets:15275236 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2864046246 (2.6 GiB) TX bytes:21735816115 (20.2 GiB) Interrupt:185 eth0:0 Link encap:Ethernet HWaddr 00:09:3D:13:DC:AB inet addr:192.168.224.107 Bcast:192.168.224.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:185 eth0:1 Link encap:Ethernet HWaddr 00:09:3D:13:DC:AB inet addr:192.168.224.160 Bcast:192.168.224.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:185 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:36728 errors:0 dropped:0 overruns:0 frame:0 TX packets:36728 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:5951488 (5.6 MiB) TX bytes:5951488 (5.6 MiB) @@
same for me, i still get the log after the patch. Olivier -----Message d'origine----- De : Brodie, Kent [mailto:brodie at mcw.edu] Envoyé : mercredi 2 août 2006 18:07 À : hobbit at hswn.dk Objet : [hobbit] Patch not done yet? was RE: rrd-data.log Hi Henrik: Below are snippets from rrd that are still causing the "Duplicate Error" on my end, even after applying the patch. In the cases where there's netstat and ifstat data shown together, I had to include both because those chunks of data came out right at the time the duplicate error appeared. Too hard to time/see which of those 2 chunks of data cause the problem. In other cases, only ifstat data caused the problem. Where I have large chunks of whitespace- that separates "instances" of the duplicate error occurring. =========== @@data#348|1154533834.591188|192.168.224.202||wolf13.hmgc.mcw.edu|netsta t data wolf13,hmgc,mcw,edu.netstat linux Ip: 56731 total packets received 0 forwarded 0 incoming packets discarded 56067 incoming packets delivered 65344 requests sent out Icmp: 648 ICMP messages received 0 input ICMP message failed. ICMP input histogram: destination unreachable: 89 echo requests: 559 648 ICMP messages sent 0 ICMP messages failed ICMP output histogram: destination unreachable: 89 echo replies: 559 Tcp: 4630 active connections openings 1 passive connection openings 1 failed connection attempts 0 connection resets received 0 connections established 46150 segments received 54675 segments send out 85 segments retransmited 0 bad segments received. 1 resets sent Udp: 9933 packets received 0 packets to unknown port received. 0 packet receive errors 10021 packets sent TcpExt: ArpFilter: 0 4616 TCP sockets finished time wait in fast timer 908 delayed acks sent 557 packets directly queued to recvmsg prequeue. 507580 packets directly received from backlog 554 packets directly received from prequeue 19157 packets header predicted 426 packets header predicted and directly queued to user TCPPureAcks: 7526 TCPHPAcks: 9608 TCPRenoRecovery: 0 TCPSackRecovery: 0 TCPSACKReneging: 0 TCPFACKReorder: 0 TCPSACKReorder: 0 TCPRenoReorder: 0 TCPTSReorder: 0 TCPFullUndo: 0 TCPPartialUndo: 0 TCPDSACKUndo: 0 TCPLossUndo: 35 TCPLoss: 0 TCPLostRetransmit: 0 TCPRenoFailures: 0 TCPSackFailures: 0 TCPLossFailures: 0 TCPFastRetrans: 0 TCPForwardRetrans: 0 TCPSlowStartRetrans: 0 TCPTimeouts: 85 TCPRenoRecoveryFail: 0 TCPSackRecoveryFail: 0 TCPSchedulerFailed: 0 TCPRcvCollapsed: 0 TCPDSACKOldSent: 0 TCPDSACKOfoSent: 0 TCPDSACKRecv: 0 TCPDSACKOfoRecv: 0 TCPAbortOnSyn: 0 TCPAbortOnData: 0 TCPAbortOnClose: 0 TCPAbortOnMemory: 0 TCPAbortOnTimeout: 0 TCPAbortOnLinger: 0 TCPAbortFailed: 0 TCPMemoryPressures: 0 @@ @@data#349|1154533834.591337|192.168.224.202||wolf13.hmgc.mcw.edu|ifstat data wolf13,hmgc,mcw,edu.ifstat linux eth1 Link encap:Ethernet HWaddr 00:30:6E:F3:0B:46 inet addr:192.168.96.113 Bcast:192.168.96.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:72022 errors:0 dropped:0 overruns:0 frame:0 TX packets:71139 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:21476056 (20.4 Mb) TX bytes:17076949 (16.2 Mb) Interrupt:56 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:189 errors:0 dropped:0 overruns:0 frame:0 TX packets:189 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:16228 (15.8 Kb) TX bytes:16228 (15.8 Kb) @@ @@data#507|1154533965.074399|192.168.224.202||bc1s2.phys.mcw.edu|ifstat data bc1s2,phys,mcw,edu.ifstat linux eth1 Link encap:Ethernet HWaddr 00:0D:60:1E:0E:DD inet addr:192.168.224.111 Bcast:192.168.224.255 Mask:255.255.255.0 inet6 addr: fe80::20d:60ff:fe1e:edd/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:43166350 errors:0 dropped:0 overruns:0 frame:0 TX packets:6166704 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:14398847028 (13731.8 Mb) TX bytes:2587305933 (2467.4 Mb) Interrupt:45 Memory:c0010000-c0020000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:104 errors:0 dropped:0 overruns:0 frame:0 TX packets:104 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:9752 (9.5 Kb) TX bytes:9752 (9.5 Kb) @@ @@data#740|1154534133.480180|192.168.224.202||dunn.hmgc.mcw.edu|netstat data dunn,hmgc,mcw,edu.netstat linux Ip: 9154191 total packets received 0 forwarded 0 incoming packets discarded 8170997 incoming packets delivered 15226980 requests sent out Icmp: 8961 ICMP messages received 9 input ICMP message failed. ICMP input histogram: destination unreachable: 16 echo requests: 8945 13860 ICMP messages sent 0 ICMP messages failed ICMP output histogram: destination unreachable: 4915 echo replies: 8945 Tcp: 33598 active connections openings 15245 passive connection openings 4 failed connection attempts 106 connection resets received 10 connections established 7841472 segments received 15058779 segments send out 412 segments retransmited 0 bad segments received. 2935 resets sent Udp: 315663 packets received 4901 packets to unknown port received. 0 packet receive errors 154341 packets sent TcpExt: 7 resets received for embryonic SYN_RECV sockets ArpFilter: 0 37618 TCP sockets finished time wait in fast timer 2 packets rejects in established connections because of timestamp 12809 delayed acks sent 2 delayed acks further delayed because of locked socket Quick ack mode was activated 1930 times 144921 packets directly queued to recvmsg prequeue. 7524 packets directly received from backlog 40929271 packets directly received from prequeue 47328 packets header predicted 135097 packets header predicted and directly queued to user TCPPureAcks: 5602771 TCPHPAcks: 2027416 TCPRenoRecovery: 47 TCPSackRecovery: 19 TCPSACKReneging: 0 TCPFACKReorder: 0 TCPSACKReorder: 0 TCPRenoReorder: 0 TCPTSReorder: 0 TCPFullUndo: 0 TCPPartialUndo: 0 TCPDSACKUndo: 0 TCPLossUndo: 39 TCPLoss: 55 TCPLostRetransmit: 0 TCPRenoFailures: 2 TCPSackFailures: 1 TCPLossFailures: 1 TCPFastRetrans: 166 TCPForwardRetrans: 8 TCPSlowStartRetrans: 35 TCPTimeouts: 151 TCPRenoRecoveryFail: 9 TCPSackRecoveryFail: 1 TCPSchedulerFailed: 0 TCPRcvCollapsed: 0 TCPDSACKOldSent: 1 TCPDSACKOfoSent: 0 TCPDSACKRecv: 0 TCPDSACKOfoRecv: 0 TCPAbortOnSyn: 0 TCPAbortOnData: 948 TCPAbortOnClose: 14 TCPAbortOnMemory: 0 TCPAbortOnTimeout: 4 TCPAbortOnLinger: 0 TCPAbortFailed: 0 TCPMemoryPressures: 0 @@ @@data#741|1154534133.480925|192.168.224.202||dunn.hmgc.mcw.edu|ifstat data dunn,hmgc,mcw,edu.ifstat linux eth0 Link encap:Ethernet HWaddr 00:09:3D:13:DC:AB inet addr:192.168.224.105 Bcast:192.168.224.255 Mask:255.255.255.0 inet6 addr: fe80::209:3dff:fe13:dcab/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:39290006 errors:0 dropped:0 overruns:0 frame:0 TX packets:15275236 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2864046246 (2.6 GiB) TX bytes:21735816115 (20.2 GiB) Interrupt:185 eth0:0 Link encap:Ethernet HWaddr 00:09:3D:13:DC:AB inet addr:192.168.224.107 Bcast:192.168.224.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:185 eth0:1 Link encap:Ethernet HWaddr 00:09:3D:13:DC:AB inet addr:192.168.224.160 Bcast:192.168.224.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:185 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:36728 errors:0 dropped:0 overruns:0 frame:0 TX packets:36728 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:5951488 (5.6 MiB) TX bytes:5951488 (5.6 MiB) @@ To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Wed, Aug 02, 2006 at 11:07:06AM -0500, Brodie, Kent wrote:
Hi Henrik: Below are snippets from rrd that are still causing the "Duplicate Error" on my end, even after applying the patch. In the cases where there's netstat and ifstat data shown together, I had to include both because those chunks of data came out right at the time the duplicate error appeared. Too hard to time/see which of those 2 chunks of data cause the problem. In other cases, only ifstat data caused the problem.
It's the ifstat data; or specifically - it is the interface aliases ("eth0:1") that messed up how the data was being parsed. So a somewhat larger patch was required. Backout the previous patch I sent you, and apply this one instead.
Or grab the current snapshot if you cannot get it applied without problems.
Regards, Henrik
I got this in the "memory" column for a Solaris 8 host this morning, which caused it to go red (even though i have the threshold set to 101).
Thu Aug 3 09:11:17 BST 2006 - Memory CRITICAL Memory Used Total Percentage red Physical 4294955003M 131072M 4294967287% green Swap 40973M 144024M 28%
That physical memory calculation is obviously incorrect!
This only happened once, then it went back to normal. The "hostdata" that was saved at the time of the alert had the following memory data:
[memory] 0 0 0 211046168 146806184 744 6249 0 0 0 0 0 0 0 0 0 2692 436955 11454 8 6 86
That looks fine to me. I can't see how it could have taken those values and got that bizzare total for memory.
This is how it normally looks for this host:
Thu Aug 3 10:01:48 BST 2006 - Memory OK
Memory Used Total Percentage green Physical 59483M 131072M 45% green Swap 40960M 144026M 28%
and this is a sample of "normal" memory data from the host:
[memory] 0 0 0 105535000 73280352 152 910 0 126 126 0 0 0 0 0 0 3339 811798 11953 4 7 89
(I'm running 4.2-RC-20060712)
This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
On Thu, Aug 03, 2006 at 11:37:02AM +0100, Colin Spargo wrote:
I got this in the "memory" column for a Solaris 8 host this morning, which caused it to go red (even though i have the threshold set to 101).
Thu Aug 3 09:11:17 BST 2006 - Memory CRITICAL Memory Used Total Percentage red Physical 4294955003M 131072M 4294967287% green Swap 40973M 144024M 28%
That physical memory calculation is obviously incorrect!
Yep.
This only happened once, then it went back to normal. The "hostdata" that was saved at the time of the alert had the following memory data:
[memory] 0 0 0 211046168 146806184 744 6249 0 0 0 0 0 0 0 0 0 2692 436955 11454 8 6 86
That looks fine to me. I can't see how it could have taken those values and got that bizzare total for memory.
Could you send me the [prtconf], [swap] and [memory] sections from that hostdata file?
Regards, Henrik
On Thu, Aug 03, 2006 at 11:37:02AM +0100, Colin Spargo wrote:
I got this in the "memory" column for a Solaris 8 host this morning, which caused it to go red (even though i have the threshold set to 101).
Thu Aug 3 09:11:17 BST 2006 - Memory CRITICAL Memory Used Total Percentage red Physical 4294955003M 131072M 4294967287% green Swap 40973M 144024M 28%
That physical memory calculation is obviously incorrect!
Yep, but the data it got were weird. Colin sent me some additional data from the client message. The interesting bits are here:
The Solaris prtconf command is used to determine the amount of RAM in the box. Here is says:
[prtconf] System Configuration: Sun Microsystems sun4u Memory size: 131072 Megabytes
So this box has 131072 MB. (128 GB - a lot, I might add. Is this really true?)
The command "vmstat 1 2|tail -1" is used to grab the current memory usage:
[memory] 0 0 0 211046168 146806184 744 6249 0 0 0 0 0 0 0 0 0 2692 436955 11454 8 6 86
Column 5 is the "free memory" column in KB, here: 146806184 KB. Divide by 1024 to get MB, and it gives 143365 MB free.
Now ... how can a box with 131072 MB RAM end up with 143365 MB free ? That's almost 12 GB more than what is physically installed in the box.
Hobbit then gets a negative value for the amount of memory used, and because it is then used in a calculation with some unsigned variables it blows up and comes up with this hilarious value of the amount of memory used.
Now, I'll admit that Hobbit should probably do a sanity check on the data so it doesn't trigger alerts in these circumstances. But the core problem is that your box is reporting some weird data.
Regards, Henrik
Henrik Stoerner wrote:
On Thu, Aug 03, 2006 at 11:37:02AM +0100, Colin Spargo wrote:
I got this in the "memory" column for a Solaris 8 host this morning, which caused it to go red (even though i have the threshold set to 101).
Thu Aug 3 09:11:17 BST 2006 - Memory CRITICAL Memory Used Total Percentage red Physical 4294955003M 131072M 4294967287% green Swap 40973M 144024M 28%
That physical memory calculation is obviously incorrect!
Yep, but the data it got were weird. Colin sent me some additional data from the client message. The interesting bits are here:
The Solaris prtconf command is used to determine the amount of RAM in the box. Here is says:
[prtconf] System Configuration: Sun Microsystems sun4u Memory size: 131072 Megabytes
So this box has 131072 MB. (128 GB - a lot, I might add. Is this really true?)
The command "vmstat 1 2|tail -1" is used to grab the current memory usage:
[memory] 0 0 0 211046168 146806184 744 6249 0 0 0 0 0 0 0 0 0 2692 436955 11454 8 6 86
Column 5 is the "free memory" column in KB, here: 146806184 KB. Divide by 1024 to get MB, and it gives 143365 MB free.
Now ... how can a box with 131072 MB RAM end up with 143365 MB free ? That's almost 12 GB more than what is physically installed in the box.
Hobbit then gets a negative value for the amount of memory used, and because it is then used in a calculation with some unsigned variables it blows up and comes up with this hilarious value of the amount of memory used.
Now, I'll admit that Hobbit should probably do a sanity check on the data so it doesn't trigger alerts in these circumstances. But the core problem is that your box is reporting some weird data.
Regards, Henrik
It sounds to me like vmstat is reporting the total memory as "real + disk"...which I believe is what the 4th column shows (I don't have a solaris server where I'm currently at to confirm). So, while hobbit keys on the physical (real) memory to determine the memory, it's keying on a computed value to do the differential. I've seen this under HPUX and linux on occasion for the memory tests for BB. I've usually just gone in and tweaked the scripts when that wouls happen.
=G=
participants (5)
-
brodie@mcw.edu
-
cspargo2@csc.com
-
gjohnson@trantor.org
-
henrik@hswn.dk
-
olivier.beau@telecomitalia.fr