Hi Folks,
Just wondering if anybody has any experience with xymongen hanging. I have a large number of xymongen processes being kicked off sometime over the weekend, unfortunately they are owned by apache and have a PPID of 1 so I can't tell how they were started. I'm presuming either xymoncmd but I can't see anything in the crontab for xymon or in tasks.cfg that would kick off the snapshots and reporting processes.
These then sit for a very long time (> 24hrs) while trying to read a data file from a specific server.
apache 14749 1 44 Oct16 ? 10:28:39 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/14748-1665896723 apache 14867 1 43 Oct16 ? 10:26:32 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/14866-1665896747 apache 15107 1 43 Oct16 ? 10:26:05 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15106-1665896768 apache 15118 1 43 Oct16 ? 10:25:58 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15117-1665896774 apache 15125 1 43 Oct16 ? 10:25:12 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15124-1665896783 apache 15238 1 43 Oct16 ? 10:23:26 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15237-1665896797 apache 15269 1 43 Oct16 ? 10:25:31 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15268-1665896804 apache 15349 1 43 Oct16 ? 10:22:20 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15348-1665896807 apache 15382 1 43 Oct16 ? 10:23:40 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15381-1665896828 apache 15398 1 43 Oct16 ? 10:25:13 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15397-1665896834 apache 15400 1 43 Oct16 ? 10:22:59 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15399-1665896837 apache 15757 1 43 Oct16 ? 10:24:48 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15756-1665896864 apache 15842 1 43 Oct16 ? 10:22:32 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15841-1665896873 apache 15964 1 43 Oct16 ? 10:24:21 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15963-1665896897 apache 15996 1 43 Oct16 ? 10:22:25 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15995-1665896912 apache 16133 1 43 Oct16 ? 10:22:07 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/16132-1665896933 apache 16149 1 43 Oct16 ? 10:23:37 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/16148-1665896954 apache 16215 1 43 Oct16 ? 10:23:45 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/16214-1665896972
An strace for the first pid is as follows (they are all the same) and looking at file descriptor 3
[root at dcslmonitor 15238]# strace -f -p 14749 Process 14749 attached read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0
fd3 is
xymongen 14749 apache cwd DIR 253,0 6 134320195 /xymon/server/data/acks xymongen 14749 apache rtd DIR 8,2 269 64 / xymongen 14749 apache txt REG 253,0 1106256 135222190 /xymon/server/server/bin/xymongen xymongen 14749 apache mem REG 8,6 155784 4448319 /usr/lib64/libselinux.so.1 xymongen 14749 apache mem REG 8,6 109976 4873245 /usr/lib64/libresolv-2.17.so xymongen 14749 apache mem REG 8,6 15688 4259351 /usr/lib64/libkeyutils.so.1.5 xymongen 14749 apache mem REG 8,6 67104 4471490 /usr/lib64/libkrb5support.so.0.1 xymongen 14749 apache mem REG 8,6 142144 4873243 /usr/lib64/libpthread-2.17.so xymongen 14749 apache mem REG 8,6 90632 4195838 /usr/lib64/libz.so.1.2.7 xymongen 14749 apache mem REG 8,6 19248 4358022 /usr/lib64/libdl-2.17.so xymongen 14749 apache mem REG 8,6 210824 4471445 /usr/lib64/libk5crypto.so.3.1 xymongen 14749 apache mem REG 8,6 15920 4939663 /usr/lib64/libcom_err.so.2.1 xymongen 14749 apache mem REG 8,6 967840 4259800 /usr/lib64/libkrb5.so.3.3 xymongen 14749 apache mem REG 8,6 320400 4256684 /usr/lib64/libgssapi_krb5.so.2.2 xymongen 14749 apache mem REG 8,6 2156272 4262067 /usr/lib64/libc-2.17.so xymongen 14749 apache mem REG 8,6 402384 4259730 /usr/lib64/libpcre.so.1.2.0 xymongen 14749 apache mem REG 8,6 2521008 4256674 /usr/lib64/libcrypto.so.1.0.2k xymongen 14749 apache mem REG 8,6 470360 4195836 /usr/lib64/libssl.so.1.0.2k xymongen 14749 apache mem REG 8,6 163312 4448246 /usr/lib64/ld-2.17.so xymongen 14749 apache 0r FIFO 0,8 0t0 404824379 pipe xymongen 14749 apache 1w FIFO 0,8 0t0 404824380 pipe xymongen 14749 apache 2w FIFO 0,8 0t0 404824381 pipe xymongen 14749 apache 3r REG 253,0 524 67195718 /xymon/server/data/hist/accessntg.sslcert
Every process (in the process list above) shows they have the same file open as fd3, are they locking each other out or more to the point, should they be?
Any ideas on where to look or what to do next?
Thanks
David Logan Senior Systems Administrator Data Centre Services Department of Corporate and Digital Development | Northern Territory Government GPO Box 2391, Darwin, NT 0801, Australia DCS Midrange Ticketing System p ... <+61> 8 8999 6968 m ... <+61> 458 631 117 New and Existing tickets: http://dcscentral.nt.gov.au/ e ... david.logan at nt.gov.au<mailto:david.logan at nt.gov.au> or dcs_service at nt.gov.au<mailto:dcs_service at nt.gov.au> w ... www.nt.gov.au<http://www.nt.gov.au/> Escalations: (08) 8999 7654
Our vision: improve government through services and solutions that exceed expectations Our values: Honest | Professional | Respectful | Accountable | Innovative The information in this e-mail is intended solely for the addressee named. It may contain legally privileged or confidential information that is subject to copyright. If you are not the intended recipient you must not use, disclose copy or distribute this communication. If you have received this message in error, please delete the e-mail and notify the sender. No representation is made that this e-mail is free of viruses. Virus scanning is recommended and is the responsibility of the recipient. Please consider the environment before printing this email.
Hi David
The "snapshot.cgi" runs from the web interface, and creates a snapshot report. The script snapshot.sh runs snapshot.cgi, and this in turn runs xymongen with "--snapshot=..." as an argument.
Similarly, the "report.cgi" runs from the web interface, and creates an availability report, using "--reportops=..." as an argument.
Also, take a look at the xymonreports.sh script. At the top (of my copy) of this script there are instructions on creating a crontab entry to run the script so as to generate daily, weekly and monthly reports. These would generate xymongen processes with "--reportopts=..." as an argument.
See "man snapshot" and "man report" for more info.
Cheers Jeremy
On Mon, 17 Oct 2022 at 15:57, David Logan <David.Logan at nt.gov.au> wrote:
Hi Folks,
Just wondering if anybody has any experience with xymongen hanging. I have a large number of xymongen processes being kicked off sometime over the weekend, unfortunately they are owned by apache and have a PPID of 1 so I can?t tell how they were started. I?m presuming either xymoncmd but I can?t see anything in the crontab for xymon or in tasks.cfg that would kick off the snapshots and reporting processes.
These then sit for a very long time (> 24hrs) while trying to read a data file from a specific server.
apache 14749 1 44 Oct16 ? 10:28:39 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/14748-1665896723
apache 14867 1 43 Oct16 ? 10:26:32 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/14866-1665896747
apache 15107 1 43 Oct16 ? 10:26:05 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15106-1665896768
apache 15118 1 43 Oct16 ? 10:25:58 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15117-1665896774
apache 15125 1 43 Oct16 ? 10:25:12 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15124-1665896783
apache 15238 1 43 Oct16 ? 10:23:26 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15237-1665896797
apache 15269 1 43 Oct16 ? 10:25:31 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15268-1665896804
apache 15349 1 43 Oct16 ? 10:22:20 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15348-1665896807
apache 15382 1 43 Oct16 ? 10:23:40 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15381-1665896828
apache 15398 1 43 Oct16 ? 10:25:13 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15397-1665896834
apache 15400 1 43 Oct16 ? 10:22:59 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15399-1665896837
apache 15757 1 43 Oct16 ? 10:24:48 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15756-1665896864
apache 15842 1 43 Oct16 ? 10:22:32 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15841-1665896873
apache 15964 1 43 Oct16 ? 10:24:21 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15963-1665896897
apache 15996 1 43 Oct16 ? 10:22:25 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15995-1665896912
apache 16133 1 43 Oct16 ? 10:22:07 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/16132-1665896933
apache 16149 1 43 Oct16 ? 10:23:37 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/16148-1665896954
apache 16215 1 43 Oct16 ? 10:23:45 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/16214-1665896972
An strace for the first pid is as follows (they are all the same) and looking at file descriptor 3
[root at dcslmonitor 15238]# strace -f -p 14749
Process 14749 attached
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
fd3 is
xymongen 14749 apache cwd DIR 253,0 6 134320195 /xymon/server/data/acks
xymongen 14749 apache rtd DIR 8,2 269 64 /
xymongen 14749 apache txt REG 253,0 1106256 135222190 /xymon/server/server/bin/xymongen
xymongen 14749 apache mem REG 8,6 155784 4448319 /usr/lib64/libselinux.so.1
xymongen 14749 apache mem REG 8,6 109976 4873245 /usr/lib64/libresolv-2.17.so
xymongen 14749 apache mem REG 8,6 15688 4259351 /usr/lib64/libkeyutils.so.1.5
xymongen 14749 apache mem REG 8,6 67104 4471490 /usr/lib64/libkrb5support.so.0.1
xymongen 14749 apache mem REG 8,6 142144 4873243 /usr/lib64/libpthread-2.17.so
xymongen 14749 apache mem REG 8,6 90632 4195838 /usr/lib64/libz.so.1.2.7
xymongen 14749 apache mem REG 8,6 19248 4358022 /usr/lib64/libdl-2.17.so
xymongen 14749 apache mem REG 8,6 210824 4471445 /usr/lib64/libk5crypto.so.3.1
xymongen 14749 apache mem REG 8,6 15920 4939663 /usr/lib64/libcom_err.so.2.1
xymongen 14749 apache mem REG 8,6 967840 4259800 /usr/lib64/libkrb5.so.3.3
xymongen 14749 apache mem REG 8,6 320400 4256684 /usr/lib64/libgssapi_krb5.so.2.2
xymongen 14749 apache mem REG 8,6 2156272 4262067 /usr/lib64/libc-2.17.so
xymongen 14749 apache mem REG 8,6 402384 4259730 /usr/lib64/libpcre.so.1.2.0
xymongen 14749 apache mem REG 8,6 2521008 4256674 /usr/lib64/libcrypto.so.1.0.2k
xymongen 14749 apache mem REG 8,6 470360 4195836 /usr/lib64/libssl.so.1.0.2k
xymongen 14749 apache mem REG 8,6 163312 4448246 /usr/lib64/ld-2.17.so
xymongen 14749 apache 0r FIFO 0,8 0t0 404824379 pipe
xymongen 14749 apache 1w FIFO 0,8 0t0 404824380 pipe
xymongen 14749 apache 2w FIFO 0,8 0t0 404824381 pipe
xymongen 14749 apache 3r REG 253,0 524 67195718 /xymon/server/data/hist/accessntg.sslcert
Every process (in the process list above) shows they have the same file open as fd3, are they locking each other out or more to the point, should they be?
Any ideas on where to look or what to do next?
Thanks
*David Logan*
*Senior Systems Administrator*
*Data Centre Services*
Department of *Corporate and Digital Development* *| *Northern Territory Government GPO Box 2391, Darwin, NT 0801, Australia
*DCS Midrange Ticketing System*
*p ... <+61> 8 8999 6968 *
*m ? <+61> 458 631 117 *New and Existing tickets: http://dcscentral.nt.gov.au/
*e ... **david.logan at nt.gov.au <david.logan at nt.gov.au> *or dcs_service at nt.gov.au
*w ? www.nt.gov.au <http://www.nt.gov.au/> **Escalations: (08) 8999 7654*
*Our vision:* *improve government through services and solutions that exceed expectations*
Our values: *Honest **| **Professional* *| Respectful | **Accountable* *| **Innovative *
The information in this e-mail is intended solely for the addressee named. It may contain legally privileged or confidential information that is subject to copyright. If you are not the intended recipient you must not use, disclose copy or distribute this communication. If you have received this message in error, please delete the e-mail and notify the sender. No representation is made that this e-mail is free of viruses. Virus scanning is recommended and is the responsibility of the recipient.
Please consider the environment before printing this email.
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
Thanks Jeremy,
Yes I saw that but I?m somewhat confused. In the tasks.cfg xymongen is set to run every minute (I think this is the distribution copy) and it probably does as our graphs are up to date. On Sunday am it starts about 17 processes to do snapshots and reports. The crontabs are empty and I cannot find where these are started from. The biggest problem is they take massive amounts of cpu while sitting at the fread of fd 3. I cannot work out what is holding it up. The whole thing should be over in a matter of an hour or so but it can take up to 72 hrs to process the whole show.
I can also see what is possible an error in the process as I don?t think there is a $ in front of a variable and I?m wondering if this is the root cause.
Thanks David
David Logan Senior Systems Administrator Data Centre Services Department of Corporate and Digital Development | Northern Territory Government GPO Box 2391, Darwin, NT 0801, Australia DCS Midrange Ticketing System p ... <+61> 8 8999 6968 m ? <+61> 458 631 117 New and Existing tickets: http://dcscentral.nt.gov.au/ e ... david.logan at nt.gov.au<mailto:david.logan at nt.gov.au> or dcs_service at nt.gov.au<mailto:dcs_service at nt.gov.au> w ? www.nt.gov.au<http://www.nt.gov.au/> Escalations: (08) 8999 7654
Our vision: improve government through services and solutions that exceed expectations Our values: Honest | Professional | Respectful | Accountable | Innovative The information in this e-mail is intended solely for the addressee named. It may contain legally privileged or confidential information that is subject to copyright. If you are not the intended recipient you must not use, disclose copy or distribute this communication. If you have received this message in error, please delete the e-mail and notify the sender. No representation is made that this e-mail is free of viruses. Virus scanning is recommended and is the responsibility of the recipient. Please consider the environment before printing this email.
From: Jeremy Laidman <jeremy at laidman.org> Sent: Monday, 17 October 2022 3:42 PM To: David Logan <David.Logan at nt.gov.au> Cc: xymon at xymon.com Subject: Re: [Xymon] xymongen hanging
Hi David
The "snapshot.cgi" runs from the web interface, and creates a snapshot report. The script snapshot.sh runs snapshot.cgi, and this in turn runs xymongen with "--snapshot=..." as an argument.
Similarly, the "report.cgi" runs from the web interface, and creates an availability report, using "--reportops=..." as an argument.
Also, take a look at the xymonreports.sh script. At the top (of my copy) of this script there are instructions on creating a crontab entry to run the script so as to generate daily, weekly and monthly reports. These would generate xymongen processes with "--reportopts=..." as an argument.
See "man snapshot" and "man report" for more info.
Cheers Jeremy
On Mon, 17 Oct 2022 at 15:57, David Logan <David.Logan at nt.gov.au<mailto:David.Logan at nt.gov.au>> wrote: Hi Folks,
Just wondering if anybody has any experience with xymongen hanging. I have a large number of xymongen processes being kicked off sometime over the weekend, unfortunately they are owned by apache and have a PPID of 1 so I can?t tell how they were started. I?m presuming either xymoncmd but I can?t see anything in the crontab for xymon or in tasks.cfg that would kick off the snapshots and reporting processes.
These then sit for a very long time (> 24hrs) while trying to read a data file from a specific server.
apache 14749 1 44 Oct16 ? 10:28:39 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/14748-1665896723 apache 14867 1 43 Oct16 ? 10:26:32 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/14866-1665896747 apache 15107 1 43 Oct16 ? 10:26:05 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15106-1665896768 apache 15118 1 43 Oct16 ? 10:25:58 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15117-1665896774 apache 15125 1 43 Oct16 ? 10:25:12 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15124-1665896783 apache 15238 1 43 Oct16 ? 10:23:26 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15237-1665896797 apache 15269 1 43 Oct16 ? 10:25:31 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15268-1665896804 apache 15349 1 43 Oct16 ? 10:22:20 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15348-1665896807 apache 15382 1 43 Oct16 ? 10:23:40 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15381-1665896828 apache 15398 1 43 Oct16 ? 10:25:13 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15397-1665896834 apache 15400 1 43 Oct16 ? 10:22:59 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15399-1665896837 apache 15757 1 43 Oct16 ? 10:24:48 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15756-1665896864 apache 15842 1 43 Oct16 ? 10:22:32 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15841-1665896873 apache 15964 1 43 Oct16 ? 10:24:21 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15963-1665896897 apache 15996 1 43 Oct16 ? 10:22:25 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15995-1665896912 apache 16133 1 43 Oct16 ? 10:22:07 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/16132-1665896933 apache 16149 1 43 Oct16 ? 10:23:37 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/16148-1665896954 apache 16215 1 43 Oct16 ? 10:23:45 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/16214-1665896972
An strace for the first pid is as follows (they are all the same) and looking at file descriptor 3
[root at dcslmonitor 15238]# strace -f -p 14749 Process 14749 attached read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0 read(3, "", 4096) = 0
fd3 is
xymongen 14749 apache cwd DIR 253,0 6 134320195 /xymon/server/data/acks xymongen 14749 apache rtd DIR 8,2 269 64 / xymongen 14749 apache txt REG 253,0 1106256 135222190 /xymon/server/server/bin/xymongen xymongen 14749 apache mem REG 8,6 155784 4448319 /usr/lib64/libselinux.so.1 xymongen 14749 apache mem REG 8,6 109976 4873245 /usr/lib64/libresolv-2.17.so<http://libresolv-2.17.so> xymongen 14749 apache mem REG 8,6 15688 4259351 /usr/lib64/libkeyutils.so.1.5 xymongen 14749 apache mem REG 8,6 67104 4471490 /usr/lib64/libkrb5support.so.0.1 xymongen 14749 apache mem REG 8,6 142144 4873243 /usr/lib64/libpthread-2.17.so<http://libpthread-2.17.so> xymongen 14749 apache mem REG 8,6 90632 4195838 /usr/lib64/libz.so.1.2.7 xymongen 14749 apache mem REG 8,6 19248 4358022 /usr/lib64/libdl-2.17.so<http://libdl-2.17.so> xymongen 14749 apache mem REG 8,6 210824 4471445 /usr/lib64/libk5crypto.so.3.1 xymongen 14749 apache mem REG 8,6 15920 4939663 /usr/lib64/libcom_err.so.2.1 xymongen 14749 apache mem REG 8,6 967840 4259800 /usr/lib64/libkrb5.so.3.3 xymongen 14749 apache mem REG 8,6 320400 4256684 /usr/lib64/libgssapi_krb5.so.2.2 xymongen 14749 apache mem REG 8,6 2156272 4262067 /usr/lib64/libc-2.17.so<http://libc-2.17.so> xymongen 14749 apache mem REG 8,6 402384 4259730 /usr/lib64/libpcre.so.1.2.0 xymongen 14749 apache mem REG 8,6 2521008 4256674 /usr/lib64/libcrypto.so.1.0.2k xymongen 14749 apache mem REG 8,6 470360 4195836 /usr/lib64/libssl.so.1.0.2k xymongen 14749 apache mem REG 8,6 163312 4448246 /usr/lib64/ld-2.17.so<http://ld-2.17.so> xymongen 14749 apache 0r FIFO 0,8 0t0 404824379 pipe xymongen 14749 apache 1w FIFO 0,8 0t0 404824380 pipe xymongen 14749 apache 2w FIFO 0,8 0t0 404824381 pipe xymongen 14749 apache 3r REG 253,0 524 67195718 /xymon/server/data/hist/accessntg.sslcert
Every process (in the process list above) shows they have the same file open as fd3, are they locking each other out or more to the point, should they be?
Any ideas on where to look or what to do next?
Thanks
David Logan Senior Systems Administrator Data Centre Services Department of Corporate and Digital Development | Northern Territory Government GPO Box 2391, Darwin, NT 0801, Australia DCS Midrange Ticketing System p ... <+61> 8 8999 6968 m ? <+61> 458 631 117 New and Existing tickets: http://dcscentral.nt.gov.au/ e ... david.logan at nt.gov.au<mailto:david.logan at nt.gov.au> or dcs_service at nt.gov.au<mailto:dcs_service at nt.gov.au> w ? www.nt.gov.au<http://www.nt.gov.au/> Escalations: (08) 8999 7654
Our vision: improve government through services and solutions that exceed expectations Our values: Honest | Professional | Respectful | Accountable | Innovative The information in this e-mail is intended solely for the addressee named. It may contain legally privileged or confidential information that is subject to copyright. If you are not the intended recipient you must not use, disclose copy or distribute this communication. If you have received this message in error, please delete the e-mail and notify the sender. No representation is made that this e-mail is free of viruses. Virus scanning is recommended and is the responsibility of the recipient. Please consider the environment before printing this email.
Xymon mailing list Xymon at xymon.com<mailto:Xymon at xymon.com> http://lists.xymon.com/mailman/listinfo/xymon
Yep, the fact that the username is apache tells me that it wasn't initiated by crontab or tasks.cfg, but instead by a user clicking on Reports > Availability Report, and Reports > Snapshot Report, in the Xymon menu.
fd3 is a file with event history. The snapshot and availability reports look through all of the history files to see any events that were present at/during the report timeframe. So this is normal, unless it's stuck on the same file for more than the briefest period. Did you run lsof on any other processes to see what files were open on fd3? If it's the same file for all of them, this might suggest a filesystem problem.
As these processes are owned by apache, it's worth taking a look at the Apache logs around the time the processes were launched. You might be able to get a more accurate start time from /proc/14749 than the output of "ps".
The missing dollar sign is peculiar. But I wonder if that's just what "ps" does. Or bash. What does the output of "strings /proc/14749/cmdline" look like?
The $XYMONGENSNAPOPTS comes from the script snapshot.sh. Mine definitely has a dollar sign in there.
J
On Tue, 18 Oct 2022 at 08:45, David Logan <David.Logan at nt.gov.au> wrote:
Thanks Jeremy,
Yes I saw that but I?m somewhat confused. In the tasks.cfg xymongen is set to run every minute (I think this is the distribution copy) and it probably does as our graphs are up to date. On Sunday am it starts about 17 processes to do snapshots and reports. The crontabs are empty and I cannot find where these are started from. The biggest problem is they take massive amounts of cpu while sitting at the fread of fd 3. I cannot work out what is holding it up. The whole thing should be over in a matter of an hour or so but it can take up to 72 hrs to process the whole show.
I can also see what is possible an error in the process as I don?t think there is a $ in front of a variable and I?m wondering if this is the root cause.
Thanks
David
*David Logan*
*Senior Systems Administrator*
*Data Centre Services*
Department of *Corporate and Digital Development* *| *Northern Territory Government GPO Box 2391, Darwin, NT 0801, Australia
*DCS Midrange Ticketing System*
*p ... <+61> 8 8999 6968 *
*m ? <+61> 458 631 117 *New and Existing tickets: http://dcscentral.nt.gov.au/
*e ... **david.logan at nt.gov.au <david.logan at nt.gov.au> *or dcs_service at nt.gov.au
*w ? www.nt.gov.au <http://www.nt.gov.au/> **Escalations: (08) 8999 7654*
*Our vision:* *improve government through services and solutions that exceed expectations*
Our values: *Honest **| **Professional* *| Respectful | **Accountable* *| **Innovative *
The information in this e-mail is intended solely for the addressee named. It may contain legally privileged or confidential information that is subject to copyright. If you are not the intended recipient you must not use, disclose copy or distribute this communication. If you have received this message in error, please delete the e-mail and notify the sender. No representation is made that this e-mail is free of viruses. Virus scanning is recommended and is the responsibility of the recipient.
Please consider the environment before printing this email.
*From:* Jeremy Laidman <jeremy at laidman.org> *Sent:* Monday, 17 October 2022 3:42 PM *To:* David Logan <David.Logan at nt.gov.au> *Cc:* xymon at xymon.com *Subject:* Re: [Xymon] xymongen hanging
Hi David
The "snapshot.cgi" runs from the web interface, and creates a snapshot report. The script snapshot.sh runs snapshot.cgi, and this in turn runs xymongen with "--snapshot=..." as an argument.
Similarly, the "report.cgi" runs from the web interface, and creates an availability report, using "--reportops=..." as an argument.
Also, take a look at the xymonreports.sh script. At the top (of my copy) of this script there are instructions on creating a crontab entry to run the script so as to generate daily, weekly and monthly reports. These would generate xymongen processes with "--reportopts=..." as an argument.
See "man snapshot" and "man report" for more info.
Cheers
Jeremy
On Mon, 17 Oct 2022 at 15:57, David Logan <David.Logan at nt.gov.au> wrote:
Hi Folks,
Just wondering if anybody has any experience with xymongen hanging. I have a large number of xymongen processes being kicked off sometime over the weekend, unfortunately they are owned by apache and have a PPID of 1 so I can?t tell how they were started. I?m presuming either xymoncmd but I can?t see anything in the crontab for xymon or in tasks.cfg that would kick off the snapshots and reporting processes.
These then sit for a very long time (> 24hrs) while trying to read a data file from a specific server.
apache 14749 1 44 Oct16 ? 10:28:39 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/14748-1665896723
apache 14867 1 43 Oct16 ? 10:26:32 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/14866-1665896747
apache 15107 1 43 Oct16 ? 10:26:05 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15106-1665896768
apache 15118 1 43 Oct16 ? 10:25:58 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15117-1665896774
apache 15125 1 43 Oct16 ? 10:25:12 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15124-1665896783
apache 15238 1 43 Oct16 ? 10:23:26 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15237-1665896797
apache 15269 1 43 Oct16 ? 10:25:31 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15268-1665896804
apache 15349 1 43 Oct16 ? 10:22:20 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15348-1665896807
apache 15382 1 43 Oct16 ? 10:23:40 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15381-1665896828
apache 15398 1 43 Oct16 ? 10:25:13 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15397-1665896834
apache 15400 1 43 Oct16 ? 10:22:59 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15399-1665896837
apache 15757 1 43 Oct16 ? 10:24:48 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15756-1665896864
apache 15842 1 43 Oct16 ? 10:22:32 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15841-1665896873
apache 15964 1 43 Oct16 ? 10:24:21 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15963-1665896897
apache 15996 1 43 Oct16 ? 10:22:25 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15995-1665896912
apache 16133 1 43 Oct16 ? 10:22:07 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/16132-1665896933
apache 16149 1 43 Oct16 ? 10:23:37 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/16148-1665896954
apache 16215 1 43 Oct16 ? 10:23:45 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/16214-1665896972
An strace for the first pid is as follows (they are all the same) and looking at file descriptor 3
[root at dcslmonitor 15238]# strace -f -p 14749
Process 14749 attached
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
read(3, "", 4096) = 0
fd3 is
xymongen 14749 apache cwd DIR 253,0 6 134320195 /xymon/server/data/acks
xymongen 14749 apache rtd DIR 8,2 269 64 /
xymongen 14749 apache txt REG 253,0 1106256 135222190 /xymon/server/server/bin/xymongen
xymongen 14749 apache mem REG 8,6 155784 4448319 /usr/lib64/libselinux.so.1
xymongen 14749 apache mem REG 8,6 109976 4873245 /usr/lib64/libresolv-2.17.so
xymongen 14749 apache mem REG 8,6 15688 4259351 /usr/lib64/libkeyutils.so.1.5
xymongen 14749 apache mem REG 8,6 67104 4471490 /usr/lib64/libkrb5support.so.0.1
xymongen 14749 apache mem REG 8,6 142144 4873243 /usr/lib64/libpthread-2.17.so
xymongen 14749 apache mem REG 8,6 90632 4195838 /usr/lib64/libz.so.1.2.7
xymongen 14749 apache mem REG 8,6 19248 4358022 /usr/lib64/libdl-2.17.so
xymongen 14749 apache mem REG 8,6 210824 4471445 /usr/lib64/libk5crypto.so.3.1
xymongen 14749 apache mem REG 8,6 15920 4939663 /usr/lib64/libcom_err.so.2.1
xymongen 14749 apache mem REG 8,6 967840 4259800 /usr/lib64/libkrb5.so.3.3
xymongen 14749 apache mem REG 8,6 320400 4256684 /usr/lib64/libgssapi_krb5.so.2.2
xymongen 14749 apache mem REG 8,6 2156272 4262067 /usr/lib64/libc-2.17.so
xymongen 14749 apache mem REG 8,6 402384 4259730 /usr/lib64/libpcre.so.1.2.0
xymongen 14749 apache mem REG 8,6 2521008 4256674 /usr/lib64/libcrypto.so.1.0.2k
xymongen 14749 apache mem REG 8,6 470360 4195836 /usr/lib64/libssl.so.1.0.2k
xymongen 14749 apache mem REG 8,6 163312 4448246 /usr/lib64/ld-2.17.so
xymongen 14749 apache 0r FIFO 0,8 0t0 404824379 pipe
xymongen 14749 apache 1w FIFO 0,8 0t0 404824380 pipe
xymongen 14749 apache 2w FIFO 0,8 0t0 404824381 pipe
xymongen 14749 apache 3r REG 253,0 524 67195718 /xymon/server/data/hist/accessntg.sslcert
Every process (in the process list above) shows they have the same file open as fd3, are they locking each other out or more to the point, should they be?
Any ideas on where to look or what to do next?
Thanks
*David Logan*
*Senior Systems Administrator*
*Data Centre Services*
Department of *Corporate and Digital Development* *| *Northern Territory Government GPO Box 2391, Darwin, NT 0801, Australia
*DCS Midrange Ticketing System*
*p ... <+61> 8 8999 6968 *
*m ? <+61> 458 631 117 *New and Existing tickets: http://dcscentral.nt.gov.au/
*e ... **david.logan at nt.gov.au <david.logan at nt.gov.au> *or dcs_service at nt.gov.au
*w ? www.nt.gov.au <http://www.nt.gov.au/> **Escalations: (08) 8999 7654*
*Our vision:* *improve government through services and solutions that exceed expectations*
Our values: *Honest **| **Professional* *| Respectful | **Accountable* *| **Innovative *
The information in this e-mail is intended solely for the addressee named. It may contain legally privileged or confidential information that is subject to copyright. If you are not the intended recipient you must not use, disclose copy or distribute this communication. If you have received this message in error, please delete the e-mail and notify the sender. No representation is made that this e-mail is free of viruses. Virus scanning is recommended and is the responsibility of the recipient.
Please consider the environment before printing this email.
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
Hi List,
Do any of you have a script to monitor Windows updates ? Or if you have an idea how to do it? (I use the powershell client: xymonclient.ps1, v2.42)
Many Thanks
Bruno
Download mine here. http://www.krisspringer.com/xymon/updates.ps1.zip
I've modifed it so it always reports green, but there's comments in it so you can revert it to flag colors if you want.? I suggest you review it and modify to your liking before just deploying it. There's a few lines at the end of the script that have hardcoded path names that you may need to edit for your environment.
Thank You, Kris Springer Systems Admin I/O Network Administration support at ionetworkadmin.com https://www.ionetworkadmin.com
On 10/18/22 07:38, Bruno Manzoni wrote:
Hi List,
Do any of you have a script to monitor Windows updates ? Or if you have an idea how to do it? (I use the powershell client: xymonclient.ps1, v2.42)
Many Thanks
Bruno
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
Hi,
We use client-local.cfg:
[powershell] clientversion:2.27:https://x.x.x.x/xymon/download/
And place xymonclient_2.27.ps1 in the xymon/download folder of your webserver. Make sure the version variable in the file matches the version variable in the client-local.cfg file and the filename!!!
It will automatically download + install + refresh the xymon client so the installed version matches the version in the section.
Stef
On 2022-10-18 15:38, Bruno Manzoni wrote:
Hi List,
Do any of you have a script to monitor Windows updates ? Or if you have an idea how to do it? (I use the powershell client: xymonclient.ps1, v2.42)
Many Thanks
Bruno
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
participants (5)
-
bruno.manzoni@ubi-network.ch
-
David.Logan@nt.gov.au
-
jeremy@laidman.org
-
stef.coene@docum.org
-
support@ionetworkadmin.com