Hi,
Logfetch is sending old data causing false alerts.
The log file looks somewhat like this: Error 2017-06-14 11:36:58.613343 39915 2184308576 Compare server: …… Error 2017-06-14 11:36:58.613481 39913 1581872992 Command server: …… …… (note. The above repeat about 780K times) Info 2017-06-14 13:07:41.113163 1193 1036199776 Compare server exited normally, pid = 45494 [sp_desvr] ….. Error 2017-06-15 02:42:22.820068 1761 2399766368 Command server:….. ……
At 6/19 and 6/20, msgs alert generated with all the old data of 6/14 and 6/15 etc. below is sniper of alert on 6/19 Mon Jun 19 17:48:57 PDT 2017 - Log files NOT ok
[red] Critical entries in /u01/shareplex/var/log/event_log<https://monitor01.lhr9.service-now.com/xymon-cgi/svcstatus.sh?CLIENT=ora164106.sjc4.service-now.com&SECTION=msgs:/u01/shareplex/var/log/event_log> [red] Error 2017-06-14 12:07:24.545252 9795 1581102944 Command server: ReconcileLog: failed to construct object-cache: Illegal state: Item 372354 already in the object id registry (connecting from ora164106.sjc4.service-now.com) [module osp] [red] Error 2017-06-14 12:07:24.545499 9795 1581102944 Command server: ReconcileLog: failed to construct object-cache: Illegal state: Item 372356 already in the object id registry (connecting from ora164106.sjc4.service-now.com) [module osp]
Meantime, see xymonclient.log: 2017-06-19 17:49:01.428381 logfetch: File /u01/shareplex/var/log/event_log shrank from >=173538314 to 48414720 bytes in size. Probably rotated; clearing position state 2017-06-19 17:49:01.428462 logfetch: /u01/shareplex/var/log/event_log delta 48414720 bytes exceeds max buffer size 10485760; skipping some data 2017-06-19 17:51:05.086815 logfetch: /u01/shareplex/var/log/event_log delta 173538314 bytes exceeds max buffer size 10485760; skipping some data 2017-06-19 17:53:09.134469 logfetch: /u01/shareplex/var/log/event_log delta 173538314 bytes exceeds max buffer size 10485760; skipping some data 2017-06-19 17:55:12.647682 logfetch: /u01/shareplex/var/log/event_log delta 173538314 bytes exceeds max buffer size 10485760; skipping some data 2017-06-19 17:57:16.163913 logfetch: /u01/shareplex/var/log/event_log delta 173538314 bytes exceeds max buffer size 10485760; skipping some data 2017-06-19 17:59:19.662801 logfetch: /u01/shareplex/var/log/event_log delta 173538314 bytes exceeds max buffer size 10485760; skipping some data 2017-06-19 18:01:23.180499 logfetch: /u01/shareplex/var/log/event_log delta 173538453 bytes exceeds max buffer size 10485760; skipping some data 2017-06-19 18:03:26.777636 logfetch: /u01/shareplex/var/log/event_log delta 125123733 bytes exceeds max buffer size 10485760; skipping some data 2017-06-20 06:42:01.519481 logfetch: File /u01/shareplex/var/log/event_log shrank from >=173541482 to 74420224 bytes in size. Probably rotated; clearing position state 2017-06-20 06:42:01.519557 logfetch: /u01/shareplex/var/log/event_log delta 74420224 bytes exceeds max buffer size 10485760; skipping some data 2017-06-20 06:44:05.173606 logfetch: /u01/shareplex/var/log/event_log delta 173541633 bytes exceeds max buffer size 10485760; skipping some data 2017-06-20 06:46:08.670466 logfetch: /u01/shareplex/var/log/event_log delta 173541633 bytes exceeds max buffer size 10485760; skipping some data 2017-06-20 06:48:12.188216 logfetch: /u01/shareplex/var/log/event_log delta 173541633 bytes exceeds max buffer size 10485760; skipping some data 2017-06-20 06:50:15.683455 logfetch: /u01/shareplex/var/log/event_log delta 173541633 bytes exceeds max buffer size 10485760; skipping some data 2017-06-20 06:52:19.250727 logfetch: /u01/shareplex/var/log/event_log delta 173541633 bytes exceeds max buffer size 10485760; skipping some data 2017-06-20 06:54:22.752463 logfetch: /u01/shareplex/var/log/event_log delta 173541633 bytes exceeds max buffer size 10485760; skipping some data 2017-06-20 06:56:23.426678 logfetch: /u01/shareplex/var/log/event_log delta 99121409 bytes exceeds max buffer size 10485760; skipping some data
Noted.
- The above 2m interval is my setup of xymon client.
- It seems the logfetch status file is not successfully saved and source code shows no error check (so no direct evidence).
- The behavior only last under 20 min. The server itself did not have disk and cpu alerts and no one report any issues related to disk and io.
I was told that this behavior is not new although rarely happen. Is there any solution or work round?
My running version is: Xymon version 4.3.25-1.el6.terabithia
Thanks, -max