Could not fork checkpoint child:Cannot allocate memory
After upgrading from xymon version 4.3.12 to 4.3.17 xymond daemon memory grow without any limit. After a 2 or 3 days this messages appear in logs:
2014-07-17 16:27:46 Setup complete 2014-07-18 04:16:49 Flapping detected for web.int:http - 10 changes in 868 seconds 2014-07-18 04:16:49 Flapping detected for web.int:tomcat - 10 changes in 868 seconds 2014-07-18 04:18:23 Flapping detected for web.int:http - 10 changes in 892 seconds 2014-07-18 04:18:23 Flapping detected for web.int:tomcat - 10 changes in 892 seconds 2014-07-18 18:25:44 Flapping detected for web.int:http - 10 changes in 808 seconds 2014-07-18 18:25:44 Flapping detected for web.int:tomcat - 10 changes in 808 seconds 2014-07-18 23:40:53 Could not fork checkpoint child:Cannot allocate memory 2014-07-18 23:50:54 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 00:00:55 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 00:10:56 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 00:20:57 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 00:30:58 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 00:40:59 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 00:51:00 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 01:01:01 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 01:11:02 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 01:21:03 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 01:31:04 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 01:41:05 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 01:51:06 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 02:01:07 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 02:11:08 Could not fork checkpoint child:Cannot allocate memory
¿Anybody knows how to avoid this problem? If I don't reboot xymon it crash.
On Mon, July 21, 2014 3:24 am, Raul GN wrote:
After upgrading from xymon version 4.3.12 to 4.3.17 xymond daemon memory grow without any limit. After a 2 or 3 days this messages appear in logs:
2014-07-17 16:27:46 Setup complete 2014-07-18 04:16:49 Flapping detected for web.int:http - 10 changes in 868 seconds 2014-07-18 04:16:49 Flapping detected for web.int:tomcat - 10 changes in 868 seconds 2014-07-18 04:18:23 Flapping detected for web.int:http - 10 changes in 892 seconds 2014-07-18 04:18:23 Flapping detected for web.int:tomcat - 10 changes in 892 seconds 2014-07-18 18:25:44 Flapping detected for web.int:http - 10 changes in 808 seconds 2014-07-18 18:25:44 Flapping detected for web.int:tomcat - 10 changes in 808 seconds 2014-07-18 23:40:53 Could not fork checkpoint child:Cannot allocate memory 2014-07-18 23:50:54 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 00:00:55 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 00:10:56 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 00:20:57 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 00:30:58 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 00:40:59 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 00:51:00 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 01:01:01 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 01:11:02 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 01:21:03 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 01:31:04 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 01:41:05 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 01:51:06 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 02:01:07 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 02:11:08 Could not fork checkpoint child:Cannot allocate memory
¿Anybody knows how to avoid this problem? If I don't reboot xymon it crash.
Hmm. If it's growing truly without limit there's something unusual going on; I'd take the memory allocation error later on at face value.
Can you provide any additional details? Do you have an unusual workload or ulimits on the xymon user? Or a large number of host inserts/removals? What OS are you running?
Regards,
-jc
Yes, I can provide you all information you need. Xymon was upgraded on 15 July and this is memory graph:
[image: Inline image 1]
Xymon is instaled in Debian 6.0.9.
This are our ulimit configuration. We didn't change it so it should be defaults values:
root: #ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 63975 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 63975 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
xymon: $ ulimit -a time(seconds) unlimited file(blocks) unlimited data(kbytes) unlimited stack(kbytes) 8192 coredump(blocks) 0 memory(kbytes) unlimited locked memory(kbytes) 64 process 63975 nofiles 1024 vmemory(kbytes) unlimited locks unlimited
We monitor 1200 hosts and don't add or remove many hosts. May be 20 host a week:
[image: Inline image 2]
xymond test is:
Statistics for Xymon daemon Version: 4.3.17 Up since 21-Jul-2014 11:37:37 (0 days, 22:00:00)
Incoming messages : 4335475
- status : 2251363
- combo : 24050
- extcombo : 104869
- page : 160
- summary : 0
- data : 1354532
- client : 92341
- notes : 0
- enable : 0
- disable : 0
- ack : 2
- config : 5968
- query : 4471
- xymondboard : 146651
- xymondlog : 341645
- drop : 3
- rename : 0
- dummy : 328
- ping : 0
- notify : 0
- schedule : 298
- download : 0
- Bogus/Timeouts : 8794 Incoming messages/sec : 52 (average last 300 seconds)
status channel messages: 2244782 (1 readers) stachg channel messages: 10913 (1 readers) page channel messages: 49720 (1 readers) data channel messages: 1349077 (1 readers) notes channel messages: 0 (0 readers) enadis channel messages: 0 (0 readers) client channel messages: 90465 (1 readers) clichg channel messages: 360 (1 readers) user channel messages: 0 (0 readers) backfeed messages : 0
Ghost reports: 10.6.71.66 reported host 10.6.42.103 reported host 10.6.42.10 10.6.42.103 reported host 10.6.42.11 10.6.42.103 reported host 10.6.42.12 10.6.42.103 reported host 10.6.42.13 10.6.42.103 reported host 10.6.42.14 10.6.42.103 reported host 10.6.42.15 10.6.42.103 reported host 10.6.42.4 10.6.42.103 reported host 10.6.42.5 10.1.0.194 reported host ePagess_app_3 10.1.0.86 reported host IIS6_6_1 10.1.0.87 reported host IIS6_6_2 10.1.0.194 reported host main0404
Multi-source statuses admin01:conn reported by 10.6.42.103 and 10.0.0.29
And xymond section in task.cfg:
[xymond]
ENVFILE /usr/lib/xymon/server/etc/xymonserver.cfg
CMD xymond --pidfile=$XYMONSERVERLOGS/xymond.pid
--restart=$XYMONTMP/xymond.chk
--checkpoint-file=$XYMONTMP/xymond.chk --checkpoint-interval=600
--log=$XYMONSERVERLOGS/xymond.log
--admin-senders=127.0.0.1,$XYMONSERVERIP
--store-clientlogs=!msgs
--maint-senders=127.0.0.1,$XYMONSERVERIP
--www-senders=127.0.0.1,10.0.0.0/24,10.6.42.103
--flap-count=10
--flap-seconds=900
We also change some MAX variables in xymonserver.cfg: MAXLINE="32768" MAXMSG_DATA="5242880" MAXMSG_CLIENT="5242880" MAXMSG_STATUS="5242880"
Thank you for you help.
On Mon, Jul 21, 2014 at 7:12 PM, J.C. Cleaver <cleaver at terabithia.org> wrote:
On Mon, July 21, 2014 3:24 am, Raul GN wrote:
After upgrading from xymon version 4.3.12 to 4.3.17 xymond daemon memory grow without any limit. After a 2 or 3 days this messages appear in logs:
2014-07-17 16:27:46 Setup complete 2014-07-18 04:16:49 Flapping detected for web.int:http - 10 changes in 868 seconds 2014-07-18 04:16:49 Flapping detected for web.int:tomcat - 10 changes in 868 seconds 2014-07-18 04:18:23 Flapping detected for web.int:http - 10 changes in 892 seconds 2014-07-18 04:18:23 Flapping detected for web.int:tomcat - 10 changes in 892 seconds 2014-07-18 18:25:44 Flapping detected for web.int:http - 10 changes in 808 seconds 2014-07-18 18:25:44 Flapping detected for web.int:tomcat - 10 changes in 808 seconds 2014-07-18 23:40:53 Could not fork checkpoint child:Cannot allocate memory 2014-07-18 23:50:54 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 00:00:55 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 00:10:56 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 00:20:57 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 00:30:58 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 00:40:59 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 00:51:00 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 01:01:01 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 01:11:02 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 01:21:03 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 01:31:04 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 01:41:05 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 01:51:06 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 02:01:07 Could not fork checkpoint child:Cannot allocate memory 2014-07-19 02:11:08 Could not fork checkpoint child:Cannot allocate memory
żAnybody knows how to avoid this problem? If I don't reboot xymon it crash.
Hmm. If it's growing truly without limit there's something unusual going on; I'd take the memory allocation error later on at face value.
Can you provide any additional details? Do you have an unusual workload or ulimits on the xymon user? Or a large number of host inserts/removals? What OS are you running?
Regards,
-jc
participants (2)
-
cleaver@terabithia.org
-
ragonlan@gmail.com