white gaps in graphs across a number of services
Hi Everyone,
Have been looking on and off at a problem I've seen for a while now, without massive success. I see intermittant 'white gaps' occuring in xymon results across a number of services, and sometimes at corresponding times, but sometimes not. Most frequently I see this gap for CPU load, and this isn't just specific to one server.
Attached is an example of useres and processes from one client server. There is a corresponding gap for the approx 3AM gap in CPU utilization graphs, memory graphs, actually, all of them I think, and a large 300second spike in clock offset at that time. But, nothing corresponding to the other gaps.
If I look at the xymon server itself, it looks like there was something up at that time too, as xymond incoming messages drops to zero. But, for the rest of the day, it holds at a steady number. But, theres are gaps all over the place in xymonnet runtime, CPU utilization, users and procs, etc.
I seem to recall we did try to tweak some rrd cache value as it cropped up in another post, which I think improved things slightly. But, we are having problems with the platforms that we're trying to monitor, with apparent long NFS pings between boxes.
The xymon server itself is running on a VM box. Has anyone had issues running on VM?
As best I can figure, either we have a xymon config issue, the xymon box itself isn't stable and it dropping data, or we have genuine network / disk write issues..
Any other thoughts?
Cheers!
The information contained in this email and any attached files is confidential and intended solely for the addressee(s). The email may be legally privileged or prohibited from disclosure and unauthorised use. If you are not the named addressee you may not use, copy, or disclose this information to any other person. If you received this message in error please notify the sender immediately and delete it from your system.
Any opinion or views contained in this email message are those of the sender, and do not represent those of the Company in any way and reliance should not be placed upon its contents. Unless otherwise stated, this email message is not intended to be contractually binding. Where an Agreement exists between our respective companies and there is conflict between the contents of this email message and the Agreement then the terms of that Agreement shall prevail.
Excelian Limited 44 Featherstone Street London EC1Y 8RN Tel: +44 (0) 20 7336 9595 www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further information visit http://www.messagelabs.com
Excelian subscribes to cleaner and greener methods of working. Help take responsibility for the environment. Please don't print this email unless you absolutely have to.
Do you see anything unusual in the xymond_rrd or xymond log(s) around that time? If messages are dropping to zero, it could definitely be a crash somewhere.
If nothing interesting shows up, try running both with --debug enabled as well... We might get a better idea of why that's happening.
Regards,
-jc
Hi Everyone,
Have been looking on and off at a problem I've seen for a while now, without massive success. I see intermittant 'white gaps' occuring in xymon results across a number of services, and sometimes at corresponding times, but sometimes not. Most frequently I see this gap for CPU load, and this isn't just specific to one server.
Attached is an example of useres and processes from one client server. There is a corresponding gap for the approx 3AM gap in CPU utilization graphs, memory graphs, actually, all of them I think, and a large 300second spike in clock offset at that time. But, nothing corresponding to the other gaps.
If I look at the xymon server itself, it looks like there was something up at that time too, as xymond incoming messages drops to zero. But, for the rest of the day, it holds at a steady number. But, theres are gaps all over the place in xymonnet runtime, CPU utilization, users and procs, etc.
I seem to recall we did try to tweak some rrd cache value as it cropped up in another post, which I think improved things slightly. But, we are having problems with the platforms that we're trying to monitor, with apparent long NFS pings between boxes.
The xymon server itself is running on a VM box. Has anyone had issues running on VM?
As best I can figure, either we have a xymon config issue, the xymon box itself isn't stable and it dropping data, or we have genuine network / disk write issues..
Any other thoughts?
Cheers!
The information contained in this email and any attached files is confidential and intended solely for the addressee(s). The email may be legally privileged or prohibited from disclosure and unauthorised use. If you are not the named addressee you may not use, copy, or disclose this information to any other person. If you received this message in error please notify the sender immediately and delete it from your system.
Any opinion or views contained in this email message are those of the sender, and do not represent those of the Company in any way and reliance should not be placed upon its contents. Unless otherwise stated, this email message is not intended to be contractually binding. Where an Agreement exists between our respective companies and there is conflict between the contents of this email message and the Agreement then the terms of that Agreement shall prevail.
Excelian Limited 44 Featherstone Street London EC1Y 8RN Tel: +44 (0) 20 7336 9595 www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further information visit http://www.messagelabs.com
Excelian subscribes to cleaner and greener methods of working. Help take responsibility for the environment. Please don't print this email unless you absolutely have to._______________________________________________ Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
Sorry.. hopefully not a stupid question, but where should I put the --debug flag? I've done this before where I think I've enabled debug, but haven't and become happy because there were no debug errors!
The logs are a bit messy at the moment, I'm trying to get rid of some of the errors, the main culprits are too many data sources for the RRD files, which I can't really explain as they work sometimes, and some cases of the message relating to 'expected message number XXX and received message number XXY' - sometimes just one or two but sometimes alot in one go.
From: cleaver at terabithia.org [cleaver at terabithia.org] Sent: 18 June 2012 19:29 To: Vincent Baines Cc: xymon at xymon.com Subject: Re: [Xymon] white gaps in graphs across a number of services
Do you see anything unusual in the xymond_rrd or xymond log(s) around that time? If messages are dropping to zero, it could definitely be a crash somewhere.
If nothing interesting shows up, try running both with --debug enabled as well... We might get a better idea of why that's happening.
Regards,
-jc
Hi Everyone,
Have been looking on and off at a problem I've seen for a while now, without massive success. I see intermittant 'white gaps' occuring in xymon results across a number of services, and sometimes at corresponding times, but sometimes not. Most frequently I see this gap for CPU load, and this isn't just specific to one server.
Attached is an example of useres and processes from one client server. There is a corresponding gap for the approx 3AM gap in CPU utilization graphs, memory graphs, actually, all of them I think, and a large 300second spike in clock offset at that time. But, nothing corresponding to the other gaps.
If I look at the xymon server itself, it looks like there was something up at that time too, as xymond incoming messages drops to zero. But, for the rest of the day, it holds at a steady number. But, theres are gaps all over the place in xymonnet runtime, CPU utilization, users and procs, etc.
I seem to recall we did try to tweak some rrd cache value as it cropped up in another post, which I think improved things slightly. But, we are having problems with the platforms that we're trying to monitor, with apparent long NFS pings between boxes.
The xymon server itself is running on a VM box. Has anyone had issues running on VM?
As best I can figure, either we have a xymon config issue, the xymon box itself isn't stable and it dropping data, or we have genuine network / disk write issues..
Any other thoughts?
Cheers!
The information contained in this email and any attached files is confidential and intended solely for the addressee(s). The email may be legally privileged or prohibited from disclosure and unauthorised use. If you are not the named addressee you may not use, copy, or disclose this information to any other person. If you received this message in error please notify the sender immediately and delete it from your system.
Any opinion or views contained in this email message are those of the sender, and do not represent those of the Company in any way and reliance should not be placed upon its contents. Unless otherwise stated, this email message is not intended to be contractually binding. Where an Agreement exists between our respective companies and there is conflict between the contents of this email message and the Agreement then the terms of that Agreement shall prevail.
Excelian Limited 44 Featherstone Street London EC1Y 8RN Tel: +44 (0) 20 7336 9595 www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further information visit http://www.messagelabs.com
Excelian subscribes to cleaner and greener methods of working. Help take responsibility for the environment. Please don't print this email unless you absolutely have to._______________________________________________ Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
The information contained in this email and any attached files is confidential and intended solely for the addressee(s). The email may be legally privileged or prohibited from disclosure and unauthorised use. If you are not the named addressee you may not use, copy, or disclose this information to any other person. If you received this message in error please notify the sender immediately and delete it from your system.
Any opinion or views contained in this email message are those of the sender, and do not represent those of the Company in any way and reliance should not be placed upon its contents. Unless otherwise stated, this email message is not intended to be contractually binding. Where an Agreement exists between our respective companies and there is conflict between the contents of this email message and the Agreement then the terms of that Agreement shall prevail.
Excelian Limited 44 Featherstone Street London EC1Y 8RN Tel: +44 (0) 20 7336 9595 www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further information visit http://www.messagelabs.com
Excelian subscribes to cleaner and greener methods of working. Help take responsibility for the environment. Please don't print this email unless you absolutely have to.
No problem.. It can be confusing with long process chains like this :)
In tasks.cfg, in [xymond] put it straight after the xymond in the CMD line. In [rrdstatus] and [rrddata], put it immediately after the "xymond_rrd" (not xymond_channel).
-jc
Sorry.. hopefully not a stupid question, but where should I put the --debug flag? I've done this before where I think I've enabled debug, but haven't and become happy because there were no debug errors!
The logs are a bit messy at the moment, I'm trying to get rid of some of the errors, the main culprits are too many data sources for the RRD files, which I can't really explain as they work sometimes, and some cases of the message relating to 'expected message number XXX and received message number XXY' - sometimes just one or two but sometimes alot in one go.
From: cleaver at terabithia.org [cleaver at terabithia.org] Sent: 18 June 2012 19:29 To: Vincent Baines Cc: xymon at xymon.com Subject: Re: [Xymon] white gaps in graphs across a number of services
Do you see anything unusual in the xymond_rrd or xymond log(s) around that time? If messages are dropping to zero, it could definitely be a crash somewhere.
If nothing interesting shows up, try running both with --debug enabled as well... We might get a better idea of why that's happening.
Regards,
-jc
Hi Everyone,
Have been looking on and off at a problem I've seen for a while now, without massive success. I see intermittant 'white gaps' occuring in xymon results across a number of services, and sometimes at corresponding times, but sometimes not. Most frequently I see this gap for CPU load, and this isn't just specific to one server.
Attached is an example of useres and processes from one client server. There is a corresponding gap for the approx 3AM gap in CPU utilization graphs, memory graphs, actually, all of them I think, and a large 300second spike in clock offset at that time. But, nothing corresponding to the other gaps.
If I look at the xymon server itself, it looks like there was something up at that time too, as xymond incoming messages drops to zero. But, for the rest of the day, it holds at a steady number. But, theres are gaps all over the place in xymonnet runtime, CPU utilization, users and procs, etc.
I seem to recall we did try to tweak some rrd cache value as it cropped up in another post, which I think improved things slightly. But, we are having problems with the platforms that we're trying to monitor, with apparent long NFS pings between boxes.
The xymon server itself is running on a VM box. Has anyone had issues running on VM?
As best I can figure, either we have a xymon config issue, the xymon box itself isn't stable and it dropping data, or we have genuine network / disk write issues..
Any other thoughts?
Cheers!
The information contained in this email and any attached files is confidential and intended solely for the addressee(s). The email may be legally privileged or prohibited from disclosure and unauthorised use. If you are not the named addressee you may not use, copy, or disclose this information to any other person. If you received this message in error please notify the sender immediately and delete it from your system.
Any opinion or views contained in this email message are those of the sender, and do not represent those of the Company in any way and reliance should not be placed upon its contents. Unless otherwise stated, this email message is not intended to be contractually binding. Where an Agreement exists between our respective companies and there is conflict between the contents of this email message and the Agreement then the terms of that Agreement shall prevail.
Excelian Limited 44 Featherstone Street London EC1Y 8RN Tel: +44 (0) 20 7336 9595 www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further information visit http://www.messagelabs.com
Excelian subscribes to cleaner and greener methods of working. Help take responsibility for the environment. Please don't print this email unless you absolutely have to._______________________________________________ Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
The information contained in this email and any attached files is confidential and intended solely for the addressee(s). The email may be legally privileged or prohibited from disclosure and unauthorised use. If you are not the named addressee you may not use, copy, or disclose this information to any other person. If you received this message in error please notify the sender immediately and delete it from your system.
Any opinion or views contained in this email message are those of the sender, and do not represent those of the Company in any way and reliance should not be placed upon its contents. Unless otherwise stated, this email message is not intended to be contractually binding. Where an Agreement exists between our respective companies and there is conflict between the contents of this email message and the Agreement then the terms of that Agreement shall prevail.
Excelian Limited 44 Featherstone Street London EC1Y 8RN Tel: +44 (0) 20 7336 9595 www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further information visit http://www.messagelabs.com
Excelian subscribes to cleaner and greener methods of working. Help take responsibility for the environment. Please don't print this email unless you absolutely have to.
Thanks! will put those changes in now and see what it collects.
One other thing thats bugged me for a while, maybe related, I get some really random spurious RRD files generated, which when I look in the trends page for a specific host really make things messy. So, for example, in ./data/rrd/hostname1 for a specific service I'm monitoring called warehouse, I should have: ./warehouse,Memory.rrd ./warehouse,Threads.rrd
but as well as those I get all sorts of randomness: warehouse,24590_24589_xymon_09.rrd warehouse,4224_1_hostname2_20.rrd warehouse,Kernel.rrd murexnet,_FONT_SIZE.rrd etc In other words, appended other server names, PIDs, and other processes, and even xymon keywords..
They seem to get generated in clumps everynow and then, say a whole load of new ones at a specific time.
With the debug flags on I'll see if anything corresponds to a time when they're created.. but someone may have seen this one before maybe..?
From: cleaver at terabithia.org [cleaver at terabithia.org] Sent: 18 June 2012 20:47 To: Vincent Baines Cc: Xymon Email List Subject: RE: [Xymon] white gaps in graphs across a number of services
No problem.. It can be confusing with long process chains like this :)
In tasks.cfg, in [xymond] put it straight after the xymond in the CMD line. In [rrdstatus] and [rrddata], put it immediately after the "xymond_rrd" (not xymond_channel).
-jc
Sorry.. hopefully not a stupid question, but where should I put the --debug flag? I've done this before where I think I've enabled debug, but haven't and become happy because there were no debug errors!
The logs are a bit messy at the moment, I'm trying to get rid of some of the errors, the main culprits are too many data sources for the RRD files, which I can't really explain as they work sometimes, and some cases of the message relating to 'expected message number XXX and received message number XXY' - sometimes just one or two but sometimes alot in one go.
From: cleaver at terabithia.org [cleaver at terabithia.org] Sent: 18 June 2012 19:29 To: Vincent Baines Cc: xymon at xymon.com Subject: Re: [Xymon] white gaps in graphs across a number of services
Do you see anything unusual in the xymond_rrd or xymond log(s) around that time? If messages are dropping to zero, it could definitely be a crash somewhere.
If nothing interesting shows up, try running both with --debug enabled as well... We might get a better idea of why that's happening.
Regards,
-jc
Hi Everyone,
Have been looking on and off at a problem I've seen for a while now, without massive success. I see intermittant 'white gaps' occuring in xymon results across a number of services, and sometimes at corresponding times, but sometimes not. Most frequently I see this gap for CPU load, and this isn't just specific to one server.
Attached is an example of useres and processes from one client server. There is a corresponding gap for the approx 3AM gap in CPU utilization graphs, memory graphs, actually, all of them I think, and a large 300second spike in clock offset at that time. But, nothing corresponding to the other gaps.
If I look at the xymon server itself, it looks like there was something up at that time too, as xymond incoming messages drops to zero. But, for the rest of the day, it holds at a steady number. But, theres are gaps all over the place in xymonnet runtime, CPU utilization, users and procs, etc.
I seem to recall we did try to tweak some rrd cache value as it cropped up in another post, which I think improved things slightly. But, we are having problems with the platforms that we're trying to monitor, with apparent long NFS pings between boxes.
The xymon server itself is running on a VM box. Has anyone had issues running on VM?
As best I can figure, either we have a xymon config issue, the xymon box itself isn't stable and it dropping data, or we have genuine network / disk write issues..
Any other thoughts?
Cheers!
The information contained in this email and any attached files is confidential and intended solely for the addressee(s). The email may be legally privileged or prohibited from disclosure and unauthorised use. If you are not the named addressee you may not use, copy, or disclose this information to any other person. If you received this message in error please notify the sender immediately and delete it from your system.
Any opinion or views contained in this email message are those of the sender, and do not represent those of the Company in any way and reliance should not be placed upon its contents. Unless otherwise stated, this email message is not intended to be contractually binding. Where an Agreement exists between our respective companies and there is conflict between the contents of this email message and the Agreement then the terms of that Agreement shall prevail.
Excelian Limited 44 Featherstone Street London EC1Y 8RN Tel: +44 (0) 20 7336 9595 www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further information visit http://www.messagelabs.com
Excelian subscribes to cleaner and greener methods of working. Help take responsibility for the environment. Please don't print this email unless you absolutely have to._______________________________________________ Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
The information contained in this email and any attached files is confidential and intended solely for the addressee(s). The email may be legally privileged or prohibited from disclosure and unauthorised use. If you are not the named addressee you may not use, copy, or disclose this information to any other person. If you received this message in error please notify the sender immediately and delete it from your system.
Any opinion or views contained in this email message are those of the sender, and do not represent those of the Company in any way and reliance should not be placed upon its contents. Unless otherwise stated, this email message is not intended to be contractually binding. Where an Agreement exists between our respective companies and there is conflict between the contents of this email message and the Agreement then the terms of that Agreement shall prevail.
Excelian Limited 44 Featherstone Street London EC1Y 8RN Tel: +44 (0) 20 7336 9595 www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further information visit http://www.messagelabs.com
Excelian subscribes to cleaner and greener methods of working. Help take responsibility for the environment. Please don't print this email unless you absolutely have to.
The information contained in this email and any attached files is confidential and intended solely for the addressee(s). The email may be legally privileged or prohibited from disclosure and unauthorised use. If you are not the named addressee you may not use, copy, or disclose this information to any other person. If you received this message in error please notify the sender immediately and delete it from your system.
Any opinion or views contained in this email message are those of the sender, and do not represent those of the Company in any way and reliance should not be placed upon its contents. Unless otherwise stated, this email message is not intended to be contractually binding. Where an Agreement exists between our respective companies and there is conflict between the contents of this email message and the Agreement then the terms of that Agreement shall prevail.
Excelian Limited 44 Featherstone Street London EC1Y 8RN Tel: +44 (0) 20 7336 9595 www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further information visit http://www.messagelabs.com
Excelian subscribes to cleaner and greener methods of working. Help take responsibility for the environment. Please don't print this email unless you absolutely have to.
On 06/19/2012 12:13 PM, Vincent Baines wrote:
One other thing thats bugged me for a while, maybe related, I get some really random spurious RRD files generated, which when I look in the trends page for a specific host really make things messy. So, for example, in ./data/rrd/hostname1 for a specific service I'm monitoring called warehouse, I should have: ./warehouse,Memory.rrd ./warehouse,Threads.rrd
but as well as those I get all sorts of randomness: warehouse,24590_24589_xymon_09.rrd warehouse,4224_1_hostname2_20.rrd warehouse,Kernel.rrd murexnet,_FONT_SIZE.rrd etc In other words, appended other server names, PIDs, and other processes, and even xymon keywords..
They seem to get generated in clumps everynow and then, say a whole load of new ones at a specific time. Colons and equal signs are confusing the RRD module of Xymon. Replacing each ':' by ':' and each '=' by '=' should solve this problem. Of course, these replacements should NOT be done in the lines containing the actual data to be entered into an RRD.
kind regards, Wim Nelis.
The NLR disclaimer is valid for NLR e-mail messages.
This message is only meant for providing information. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of the sender. This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. Sender accepts no liability for damage of any kind resulting from the risks inherent in the electronic transmission of messages.
Sorry, could I just check where abouts you mean? In the client side scripts reporting back to xymon, where I use colons to seperate variable name : value? e.g. where i have:
$BB $BBDISP "status $MACHINE.$TIDY_SERVICE red $(date)
Memory : 0 Threads : 0"
change to
$BB $BBDISP "status $MACHINE.$TIDY_SERVICE red $(date)
Memory : 0 Threads : 0"
or in the xymonserver.cfg definitions, where I have: SPLITNCV_warehouse="Memory:GAUGE,Threads:GAUGE" to SPLITNCV_warehouse="Memory:GAUGE,Threads:GAUGE"
The only slightly special char I use is the - (I have a couple of names such as data-feed), does that need to be changed to a corresponding code? Or have I missed something.
Very much appreciate the help!
From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] on behalf of W.J.M. Nelis [Wim.Nelis at nlr.nl] Sent: 19 June 2012 11:20 To: xymon at xymon.com Subject: Re: [Xymon] white gaps in graphs across a number of services
On 06/19/2012 12:13 PM, Vincent Baines wrote:
One other thing thats bugged me for a while, maybe related, I get some really random spurious RRD files generated, which when I look in the trends page for a specific host really make things messy. So, for example, in ./data/rrd/hostname1 for a specific service I'm monitoring called warehouse, I should have: ./warehouse,Memory.rrd ./warehouse,Threads.rrd
but as well as those I get all sorts of randomness: warehouse,24590_24589_xymon_09.rrd warehouse,4224_1_hostname2_20.rrd warehouse,Kernel.rrd murexnet,_FONT_SIZE.rrd etc In other words, appended other server names, PIDs, and other processes, and even xymon keywords..
They seem to get generated in clumps everynow and then, say a whole load of new ones at a specific time. Colons and equal signs are confusing the RRD module of Xymon. Replacing each ':' by ':' and each '=' by '=' should solve this problem. Of course, these replacements should NOT be done in the lines containing the actual data to be entered into an RRD.
kind regards, Wim Nelis.
The NLR disclaimer is valid for NLR e-mail messages.
This message is only meant for providing information. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of the sender. This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. Sender accepts no liability for damage of any kind resulting from the risks inherent in the electronic transmission of messages.
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
The information contained in this email and any attached files is confidential and intended solely for the addressee(s). The email may be legally privileged or prohibited from disclosure and unauthorised use. If you are not the named addressee you may not use, copy, or disclose this information to any other person. If you received this message in error please notify the sender immediately and delete it from your system.
Any opinion or views contained in this email message are those of the sender, and do not represent those of the Company in any way and reliance should not be placed upon its contents. Unless otherwise stated, this email message is not intended to be contractually binding. Where an Agreement exists between our respective companies and there is conflict between the contents of this email message and the Agreement then the terms of that Agreement shall prevail.
Excelian Limited 44 Featherstone Street London EC1Y 8RN Tel: +44 (0) 20 7336 9595 www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further information visit http://www.messagelabs.com
Excelian subscribes to cleaner and greener methods of working. Help take responsibility for the environment. Please don't print this email unless you absolutely have to.
On 06/19/2012 12:31 PM, Vincent Baines wrote:
Sorry, could I just check where abouts you mean? In the client side scripts reporting back to xymon, where I use colons to seperate variable name : value? e.g. where i have: You right, I did not specify that part. The place to change colons and equal signs is the status (or data) message which is sent to Xymon. I've seen a few times that a line containing a colon resulted in an unexpected RRD file to be created.
$BB $BBDISP "status $MACHINE.$TIDY_SERVICE red $(date)Memory : 0 Threads : 0"
change to
$BB $BBDISP "status $MACHINE.$TIDY_SERVICE red $(date)Memory: 0 Threads: 0"
The RRD collector in Xymon searches for lines, which contain a colon or an equal-sign and satisfy some other criteria as well. By replacing the colon or equal-sign in those lines which are not meant to contain data for an RRD, you're certain that those lines will not result in funny RRDs.
Kind regards, Wim Nelis.
or in the xymonserver.cfg definitions, where I have: SPLITNCV_warehouse="Memory:GAUGE,Threads:GAUGE" to SPLITNCV_warehouse="Memory:GAUGE,Threads:GAUGE"
The only slightly special char I use is the - (I have a couple of names such as data-feed), does that need to be changed to a corresponding code? Or have I missed something.
Very much appreciate the help!
From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] on behalf of W.J.M. Nelis [Wim.Nelis at nlr.nl] Sent: 19 June 2012 11:20 To: xymon at xymon.com Subject: Re: [Xymon] white gaps in graphs across a number of services
On 06/19/2012 12:13 PM, Vincent Baines wrote:
One other thing thats bugged me for a while, maybe related, I get some really random spurious RRD files generated, which when I look in the trends page for a specific host really make things messy. So, for example, in ./data/rrd/hostname1 for a specific service I'm monitoring called warehouse, I should have: ./warehouse,Memory.rrd ./warehouse,Threads.rrd
but as well as those I get all sorts of randomness: warehouse,24590_24589_xymon_09.rrd warehouse,4224_1_hostname2_20.rrd warehouse,Kernel.rrd murexnet,_FONT_SIZE.rrd etc In other words, appended other server names, PIDs, and other processes, and even xymon keywords..
They seem to get generated in clumps everynow and then, say a whole load of new ones at a specific time. Colons and equal signs are confusing the RRD module of Xymon. Replacing each ':' by ':' and each '=' by '=' should solve this problem. Of course, these replacements should NOT be done in the lines containing the actual data to be entered into an RRD.
kind regards, Wim Nelis.
The NLR disclaimer is valid for NLR e-mail messages.
This message is only meant for providing information. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of the sender. This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. Sender accepts no liability for damage of any kind resulting from the risks inherent in the electronic transmission of messages.
Well, still getting these issues despite tidying alot of errors away.. had quite a few misses last night. Selection of error messages I get include: alot of these 2012-06-20 11:13:17 xymond_rrd: Got message 460528, expected 460520 2012-06-20 11:14:22 xymond_rrd: Got message 460720, expected 460712 2012-06-20 11:15:41 xymond_rrd: Got message 461145, expected 461133 2012-06-20 11:18:15 xymond_rrd: Got message 462593, expected 462584 2012-06-20 11:18:19 Peer at 0.0.0.0:0 failed: Broken pipe 27089 2012-06-20 11:18:19 Semaphore wait aborted: Interrupted system call 2012-06-20 11:18:19 Peer not up, flushing message queue 27089 2012-06-20 11:18:19 Connecting to peer 0.0.0.0:0 27089 2012-06-20 11:18:19 Peer is UP 2012-06-20 11:18:19 Unknown token 'MEMSTAT' ignored at line 385
at the time of some gaps I get these: 2012-06-20 02:00:57 xymond_rrd: Got message 242464, expected 242463 2012-06-20 02:01:06 Flushed 12 stale messages for 0.0.0.0:0 2012-06-20 02:01:07 Flushed 4 stale messages for 0.0.0.0:0 2012-06-20 02:01:08 xymond_rrd: Got message 242493, expected 242476 2012-06-20 02:01:09 Flushed 5 stale messages for 0.0.0.0:0 2012-06-20 02:01:10 xymond_rrd: Got message 242512, expected 242507 2012-06-20 02:01:36 Flushed 9 stale messages for 0.0.0.0:0 2012-06-20 02:01:37 Flushed 11 stale messages for 0.0.0.0:0 2012-06-20 02:01:38 Flushed 9 stale messages for 0.0.0.0:0 2012-06-20 02:01:39 Flushed 11 stale messages for 0.0.0.0:0 2012-06-20 02:01:39 xymond_rrd: Got message 242703, expected 242663 2012-06-20 02:01:40 xymond_rrd: Got message 242799, expected 242797 2012-06-20 02:01:52 xymond_rrd: Got message 242855, expected 242846 2012-06-20 02:01:53 xymond_rrd: Got message 242874, expected 242866 (and even more in rrd-data.log
and quite a few of these: 2012-06-20 10:46:57 RRD error updating /xymon/data/rrd/hostname1/allext.rrd from 172.30.166.218: /xymon/data/rrd/hostname1/allext.rrd: found extra data on update argument: 46:+2:0.28:80:91.5:64:13:00:04:00:00:00:23:20:00:25:45:29:21:30:44:03:00:54:41:59:42:09:29:51:11:01:50:39:52:59
I'm guessing the latter might be the cause of why I see random RRD files created - there's some strange characters in there. But, I've added an echo to the custom script to log what it sends to xymon, so far the output of that is what I'd expect. Is there some sort of corruption possible - two updates at exactly the same time corrupting somehow?!
Anything suggestions?
Thanks!
From: cleaver at terabithia.org [cleaver at terabithia.org] Sent: 18 June 2012 20:47 To: Vincent Baines Cc: Xymon Email List Subject: RE: [Xymon] white gaps in graphs across a number of services
No problem.. It can be confusing with long process chains like this :)
In tasks.cfg, in [xymond] put it straight after the xymond in the CMD line. In [rrdstatus] and [rrddata], put it immediately after the "xymond_rrd" (not xymond_channel).
-jc
Sorry.. hopefully not a stupid question, but where should I put the --debug flag? I've done this before where I think I've enabled debug, but haven't and become happy because there were no debug errors!
The logs are a bit messy at the moment, I'm trying to get rid of some of the errors, the main culprits are too many data sources for the RRD files, which I can't really explain as they work sometimes, and some cases of the message relating to 'expected message number XXX and received message number XXY' - sometimes just one or two but sometimes alot in one go.
From: cleaver at terabithia.org [cleaver at terabithia.org] Sent: 18 June 2012 19:29 To: Vincent Baines Cc: xymon at xymon.com Subject: Re: [Xymon] white gaps in graphs across a number of services
Do you see anything unusual in the xymond_rrd or xymond log(s) around that time? If messages are dropping to zero, it could definitely be a crash somewhere.
If nothing interesting shows up, try running both with --debug enabled as well... We might get a better idea of why that's happening.
Regards,
-jc
Hi Everyone,
Have been looking on and off at a problem I've seen for a while now, without massive success. I see intermittant 'white gaps' occuring in xymon results across a number of services, and sometimes at corresponding times, but sometimes not. Most frequently I see this gap for CPU load, and this isn't just specific to one server.
Attached is an example of useres and processes from one client server. There is a corresponding gap for the approx 3AM gap in CPU utilization graphs, memory graphs, actually, all of them I think, and a large 300second spike in clock offset at that time. But, nothing corresponding to the other gaps.
If I look at the xymon server itself, it looks like there was something up at that time too, as xymond incoming messages drops to zero. But, for the rest of the day, it holds at a steady number. But, theres are gaps all over the place in xymonnet runtime, CPU utilization, users and procs, etc.
I seem to recall we did try to tweak some rrd cache value as it cropped up in another post, which I think improved things slightly. But, we are having problems with the platforms that we're trying to monitor, with apparent long NFS pings between boxes.
The xymon server itself is running on a VM box. Has anyone had issues running on VM?
As best I can figure, either we have a xymon config issue, the xymon box itself isn't stable and it dropping data, or we have genuine network / disk write issues..
Any other thoughts?
Cheers!
The information contained in this email and any attached files is confidential and intended solely for the addressee(s). The email may be legally privileged or prohibited from disclosure and unauthorised use. If you are not the named addressee you may not use, copy, or disclose this information to any other person. If you received this message in error please notify the sender immediately and delete it from your system.
Any opinion or views contained in this email message are those of the sender, and do not represent those of the Company in any way and reliance should not be placed upon its contents. Unless otherwise stated, this email message is not intended to be contractually binding. Where an Agreement exists between our respective companies and there is conflict between the contents of this email message and the Agreement then the terms of that Agreement shall prevail.
Excelian Limited 44 Featherstone Street London EC1Y 8RN Tel: +44 (0) 20 7336 9595 www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further information visit http://www.messagelabs.com
Excelian subscribes to cleaner and greener methods of working. Help take responsibility for the environment. Please don't print this email unless you absolutely have to._______________________________________________ Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
The information contained in this email and any attached files is confidential and intended solely for the addressee(s). The email may be legally privileged or prohibited from disclosure and unauthorised use. If you are not the named addressee you may not use, copy, or disclose this information to any other person. If you received this message in error please notify the sender immediately and delete it from your system.
Any opinion or views contained in this email message are those of the sender, and do not represent those of the Company in any way and reliance should not be placed upon its contents. Unless otherwise stated, this email message is not intended to be contractually binding. Where an Agreement exists between our respective companies and there is conflict between the contents of this email message and the Agreement then the terms of that Agreement shall prevail.
Excelian Limited 44 Featherstone Street London EC1Y 8RN Tel: +44 (0) 20 7336 9595 www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further information visit http://www.messagelabs.com
Excelian subscribes to cleaner and greener methods of working. Help take responsibility for the environment. Please don't print this email unless you absolutely have to.
The information contained in this email and any attached files is confidential and intended solely for the addressee(s). The email may be legally privileged or prohibited from disclosure and unauthorised use. If you are not the named addressee you may not use, copy, or disclose this information to any other person. If you received this message in error please notify the sender immediately and delete it from your system.
Any opinion or views contained in this email message are those of the sender, and do not represent those of the Company in any way and reliance should not be placed upon its contents. Unless otherwise stated, this email message is not intended to be contractually binding. Where an Agreement exists between our respective companies and there is conflict between the contents of this email message and the Agreement then the terms of that Agreement shall prevail.
Excelian Limited 44 Featherstone Street London EC1Y 8RN Tel: +44 (0) 20 7336 9595 www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further information visit http://www.messagelabs.com
Excelian subscribes to cleaner and greener methods of working. Help take responsibility for the environment. Please don't print this email unless you absolutely have to.
participants (3)
-
cleaver@terabithia.org
-
vincent.baines@excelian.com
-
Wim.Nelis@nlr.nl