DS override - can't get to work
Hello,
Using the Terabithia RPMs, I've been looking at adding a 'DS' override entry to our analysis.cfg file.
I have an entry for a web server as:
DS http tcp.http.https:,,weed.plymouth.ac.uk,.rrd:sec >0.0007 COLOR=yellow
The comparison value (0.0007) was set low just to see that this was working. However, I can't get it to work. The 'http' column remains green, and the text shown of the http response (on the web page) indicates that the time is above the threshold value (e.g. 'Seconds: 0.088597000').
Anyone any ideas about this?
Thanks,
John.
-- John Horne | Senior Operations Analyst | Technology and Information Services University of Plymouth | Drake Circus | Plymouth | Devon | PL4 8AA | UK
[http://www.plymouth.ac.uk/images/email_footer.gif]<http://www.plymouth.ac.uk/worldclass>
This email and any files with it are confidential and intended solely for the use of the recipient to whom it is addressed. If you are not the intended recipient then copying, distribution or other use of the information contained is strictly prohibited and you should not rely on it. If you have received this email in error please let the sender know immediately and delete it from your system(s). Internet emails are not necessarily secure. While we take every care, University of Plymouth accepts no responsibility for viruses and it is your responsibility to scan emails and their attachments. University of Plymouth does not accept responsibility for any changes made after it was sent. Nothing in this email or its attachments constitutes an order for goods or services unless accompanied by an official order form.
On Fri, 2019-05-17 at 12:34 +0000, John Horne wrote:
Using the Terabithia RPMs, I've been looking at adding a 'DS' override entry to our analysis.cfg file.
I have an entry for a web server as:
DS http tcp.http.https:,,weed.plymouth.ac.uk,.rrd:sec >0.0007 COLOR=yellow
I checked the rrd file and it is of type 'GAUGE', and 'sec' is the correct DS section name.
I tried setting the rule to '>0.0' but it still showed a green status.
I also tried creating a symlink from the rrd file to a more simpler name ('jh.rrd -> tcp.http...') - still showed green.
I have also run xymonnet with the '--debug' option, but can see nothing about the DS entry or checking the rule to determine the overall colour. I can see the http test colour being determined, but nothing about the DS entry.
I'm wondering if the fact that we are using 'httpstatus;httpsch://weed.plymouth.ac.uk/;"^[23]";"^[^23]"' for the test is causing a problem.
If I can I'll take a look at the code later on, but to be honest I'm a bit stumped by this.
John.
-- John Horne | Senior Operations Analyst | Technology and Information Services University of Plymouth | Drake Circus | Plymouth | Devon | PL4 8AA | UK
[http://www.plymouth.ac.uk/images/email_footer.gif]<http://www.plymouth.ac.uk/worldclass>
This email and any files with it are confidential and intended solely for the use of the recipient to whom it is addressed. If you are not the intended recipient then copying, distribution or other use of the information contained is strictly prohibited and you should not rely on it. If you have received this email in error please let the sender know immediately and delete it from your system(s). Internet emails are not necessarily secure. While we take every care, University of Plymouth accepts no responsibility for viruses and it is your responsibility to scan emails and their attachments. University of Plymouth does not accept responsibility for any changes made after it was sent. Nothing in this email or its attachments constitutes an order for goods or services unless accompanied by an official order form.
On Fri, 2019-05-17 at 13:23 +0000, John Horne wrote:
I'm wondering if the fact that we are using 'httpstatus;httpsch://weed.plymouth.ac.uk/;"^[23]";"^[^23]"' for the test is causing a problem.
If I can I'll take a look at the code later on, but to be honest I'm a bit stumped by this.
I changed the httpstatus check to just 'httpstatus;http://...'. I adjusted the filename in the analysis.cfg file too. Seems to have made no difference though.
I ran xymond with the '--debug' option, but again could not see anything relevant about the DS.
Now trying to find out where all this happens in the code... :-(
John.
-- John Horne | Senior Operations Analyst | Technology and Information Services University of Plymouth | Drake Circus | Plymouth | Devon | PL4 8AA | UK
[http://www.plymouth.ac.uk/images/email_footer.gif]<http://www.plymouth.ac.uk/worldclass>
This email and any files with it are confidential and intended solely for the use of the recipient to whom it is addressed. If you are not the intended recipient then copying, distribution or other use of the information contained is strictly prohibited and you should not rely on it. If you have received this email in error please let the sender know immediately and delete it from your system(s). Internet emails are not necessarily secure. While we take every care, University of Plymouth accepts no responsibility for viruses and it is your responsibility to scan emails and their attachments. University of Plymouth does not accept responsibility for any changes made after it was sent. Nothing in this email or its attachments constitutes an order for goods or services unless accompanied by an official order form.
On 5/17/2019 7:59 AM, John Horne wrote:
On Fri, 2019-05-17 at 13:23 +0000, John Horne wrote:
I'm wondering if the fact that we are using 'httpstatus;httpsch://weed.plymouth.ac.uk/;"^[23]";"^[^23]"' for the test is causing a problem.
If I can I'll take a look at the code later on, but to be honest I'm a bit stumped by this.
I changed the httpstatus check to just 'httpstatus;http://...'. I adjusted the filename in the analysis.cfg file too. Seems to have made no difference though.
I ran xymond with the '--debug' option, but again could not see anything relevant about the DS.
Now trying to find out where all this happens in the code... :-(
Can you run xymond_rrd with --debug mode? Specifically, the one reading from the "status" channel.
This is responsible for taking the incoming data point and turning it into a "modify" message, so if there's a parsing problem it should show up either at the time of the http message receipt or on initial load as it's importing the rules to begin with.
If a "modify" message is properly being sent out, then it's possible the host+svc combination it's being tagged with is incorrect.
-jc
On Fri, 2019-05-17 at 08:26 -0700, Japheth Cleaver wrote:
On 5/17/2019 7:59 AM, John Horne wrote:
On Fri, 2019-05-17 at 13:23 +0000, John Horne wrote:
I'm wondering if the fact that we are using 'httpstatus;httpsch://weed.plymouth.ac.uk/;"^[23]";"^[^23]"' for the test is causing a problem.
Can you run xymond_rrd with --debug mode? Specifically, the one reading from the "status" channel.
Okay, done that. All that I can see though are entries such as this: =========== 73838 2019-05-17 16:48:39.732827 xymond_rrd: Got message 744 @@status#744/weed|1558108119.732746|10.120.16.9||weed|http|1558109019|green||gr een|1558108057|0||0||1558108102|linux||0| 73838 2019-05-17 16:48:39.732830 startpos 275616, fillpos 275616, endpos -1 73838 2019-05-17 16:48:39.732846 - /weed/tcp.http.https:,,weed.plymouth.ac.uk,.rrd: storing 15 bytes into seq 744 (pos: 1/23), at 1558108119: 1558108119:0.08 =========== which looks like it is just updating the RRD file. No mention of 'anything being modified (or rather 'modify') at all. Starting xymon seems to show no problems either. Entries seen such as: =========== 85455 2019-05-17 17:04:33.658972 loadhostnames:checking if this host weed has been defined before... 85455 2019-05-17 17:04:33.658975 loadhostnames:adding host weed as a new item... = 0x55b1f26cd920 85455 2019-05-17 17:04:33.658979 loadhostnames:build_hosttree - status for that add was 0 ... 85455 2019-05-17 17:04:33.660025 loadhosts:build_hosttree - walk->clientname for weed is: weed 85455 2019-05-17 17:04:33.660028 loadhosts:build_hosttree - xtreeAdd to rbclients for weed at 0x55b1f26cd920 85455 2019-05-17 17:04:33.660032 loadhosts:build_hosttree - status for that add was 0 =========== John. -- John Horne | Senior Operations Analyst | Technology and Information Services University of Plymouth | Drake Circus | Plymouth | Devon | PL4 8AA | UK ________________________________ [http://www.plymouth.ac.uk/images/email_footer.gif]<http://www.plymouth.ac.uk/worldclass> This email and any files with it are confidential and intended solely for the use of the recipient to whom it is addressed. If you are not the intended recipient then copying, distribution or other use of the information contained is strictly prohibited and you should not rely on it. If you have received this email in error please let the sender know immediately and delete it from your system(s). Internet emails are not necessarily secure. While we take every care, University of Plymouth accepts no responsibility for viruses and it is your responsibility to scan emails and their attachments. University of Plymouth does not accept responsibility for any changes made after it was sent. Nothing in this email or its attachments constitutes an order for goods or services unless accompanied by an official order form.
On Fri, 2019-05-17 at 14:59 +0000, John Horne wrote:
On Fri, 2019-05-17 at 13:23 +0000, John Horne wrote:
I'm wondering if the fact that we are using 'httpstatus;httpsch://weed.plymouth.ac.uk/;"^[23]";"^[^23]"' for the test is causing a problem.
Okay, so I tried using DS with the 'conn' test, and that worked fine (the page went yellow).
Slightly worrying is that the default message which shows the rule to be used is restricted to only 2 decimal places. The code reads the value as a 'double', but just displays it to 2 places. (So, in my case using a rule of '>0.0007' it showed on the web page as '>0.00'. Using a rule of '0.007' (one less 0), this was shown as '>0.01' - so it rounded it up.) A minor point probably, just a little confusing when trying to force a result using a very small value.
John.
-- John Horne | Senior Operations Analyst | Technology and Information Services University of Plymouth | Drake Circus | Plymouth | Devon | PL4 8AA | UK
[http://www.plymouth.ac.uk/images/email_footer.gif]<http://www.plymouth.ac.uk/worldclass>
This email and any files with it are confidential and intended solely for the use of the recipient to whom it is addressed. If you are not the intended recipient then copying, distribution or other use of the information contained is strictly prohibited and you should not rely on it. If you have received this email in error please let the sender know immediately and delete it from your system(s). Internet emails are not necessarily secure. While we take every care, University of Plymouth accepts no responsibility for viruses and it is your responsibility to scan emails and their attachments. University of Plymouth does not accept responsibility for any changes made after it was sent. Nothing in this email or its attachments constitutes an order for goods or services unless accompanied by an official order form.
On Fri, 2019-05-17 at 15:44 +0000, John Horne wrote:
On Fri, 2019-05-17 at 14:59 +0000, John Horne wrote:
On Fri, 2019-05-17 at 13:23 +0000, John Horne wrote:
I'm wondering if the fact that we are using 'httpstatus;httpsch://weed.plymouth.ac.uk/;"^[23]";"^[^23]"' for the test is causing a problem.
Okay, so I tried using DS with the 'conn' test, and that worked fine (the page went yellow).
I decided to change the HTTP test DS entry to use a regex for the filename, and that worked!
If I use 'DS http %^tcp\.http\.https.*weed.*\.rrd:sec >0.0007 COLOR=yellow' then that works fine.
I'll see if I can work backwards to find out what in the original filename is causing the problem.
John.
-- John Horne | Senior Operations Analyst | Technology and Information Services University of Plymouth | Drake Circus | Plymouth | Devon | PL4 8AA | UK
[http://www.plymouth.ac.uk/images/email_footer.gif]<http://www.plymouth.ac.uk/worldclass>
This email and any files with it are confidential and intended solely for the use of the recipient to whom it is addressed. If you are not the intended recipient then copying, distribution or other use of the information contained is strictly prohibited and you should not rely on it. If you have received this email in error please let the sender know immediately and delete it from your system(s). Internet emails are not necessarily secure. While we take every care, University of Plymouth accepts no responsibility for viruses and it is your responsibility to scan emails and their attachments. University of Plymouth does not accept responsibility for any changes made after it was sent. Nothing in this email or its attachments constitutes an order for goods or services unless accompanied by an official order form.
On Fri, 2019-05-17 at 16:22 +0000, John Horne wrote:
On Fri, 2019-05-17 at 15:44 +0000, John Horne wrote:
On Fri, 2019-05-17 at 14:59 +0000, John Horne wrote:
On Fri, 2019-05-17 at 13:23 +0000, John Horne wrote:
I'm wondering if the fact that we are using 'httpstatus;httpsch://weed.plymouth.ac.uk/;"^[23]";"^[^23]"' for the test is causing a problem.
Okay, so I tried using DS with the 'conn' test, and that worked fine (the page went yellow).
I decided to change the HTTP test DS entry to use a regex for the filename, and that worked!
If I use 'DS http %^tcp\.http\.https.*weed.*\.rrd:sec >0.0007 COLOR=yellow' then that works fine.
I'll see if I can work backwards to find out what in the original filename is causing the problem.
The only way I can get this test to work is by prefixing the filename with a '%' in order to make it a regex. By trial and error, and not as a regex, I have tried escaping the colon and comma characters. I have tried including the whole of the filename in single quotes and double quotes, and then repeated that on literally just the filename part. All of these failed.
Oddly I repeated the DS settings on a different client server (same xymon server), and noticed that the RRD filename was different. In the hosts.cfg file, if I use 'httpstatus;http://x1...' then the filename produced is 'tcp.http.x1...'. But if I use 'httpstatus;https://x2...' then the filename becomes 'tcp.http.https:,,x2...'. The '://' part of the URL is now included in the filename (and commafied(!)).
Anyway, it's back to the code I guess.
John.
-- John Horne | Senior Operations Analyst | Technology and Information Services University of Plymouth | Drake Circus | Plymouth | Devon | PL4 8AA | UK
[http://www.plymouth.ac.uk/images/email_footer.gif]<http://www.plymouth.ac.uk/worldclass>
This email and any files with it are confidential and intended solely for the use of the recipient to whom it is addressed. If you are not the intended recipient then copying, distribution or other use of the information contained is strictly prohibited and you should not rely on it. If you have received this email in error please let the sender know immediately and delete it from your system(s). Internet emails are not necessarily secure. While we take every care, University of Plymouth accepts no responsibility for viruses and it is your responsibility to scan emails and their attachments. University of Plymouth does not accept responsibility for any changes made after it was sent. Nothing in this email or its attachments constitutes an order for goods or services unless accompanied by an official order form.
On Fri, 2019-05-17 at 12:34 +0000, John Horne wrote:
Hello,
Using the Terabithia RPMs, I've been looking at adding a 'DS' override entry to our analysis.cfg file.
I have an entry for a web server as:
DS http tcp.http.https:,,weed.plymouth.ac.uk,.rrd:sec >0.0007 COLOR=yellow
The comparison value (0.0007) was set low just to see that this was working. However, I can't get it to work. The 'http' column remains green, and the text shown of the http response (on the web page) indicates that the time is above the threshold value (e.g. 'Seconds: 0.088597000').
Hello,
Okay, the problem seems to be with the commas in the filename. Within the file 'xymond/client_config.c' exists the function 'check_rrdds_thresholds'. This is used to check the DS rules, and checks the RRD filename against the one in the rule. It does this by calling the function 'namematch' (found in 'lib/matching.c').
However, 'namematch' splits up the filename into tokens based on the comma character. (Not sure why it does this, but I assume the function is used elsewhere and makes sense in those cases.) As such, the filename never matches if it contains a comma character (and mine has several of them).
A short context diff patch is attached, but I'm not sure it's the best. Basically for the 'http' test don't call 'namematch' but call 'patternmatch' instead. This is a similar function (also in matching.c), but doesn't split the filename into tokens. It is usually used for matching substrings, but will work when comparing two full filenames. It also takes care of regex rules. Using it locally the patch seems to work fine.
The patch restricts calling 'patternmatch' only if the DS column name is 'http'. As far as I can tell this is the only time it is required. So it should have no effect if DS is used elsewhere. However, others may be using DS with other column names (I'm thinking 'apache' here), but which also use a filename with commas. It may be that in those cases the bug will reappear unless the column name is also tested for 'apache'.
As always, feel free to modify, or even reject, the patch.
As an addition, I did get the reported limits (&L and &U) reported with the precision (significant digits) as used by the user in the analysis.cfg file. So, in my case, '0.0007' has a precision of 4 set. The default text output when using DS then uses this precision for all values. (It also took care of when exponentiation was used, no leading zeros etc.) This all worked well. However, the value reported for the RRD value (&V) seemed likewise restricted to 2 significant digits. I was loath to change this as it may well break things, and seemed to be similarly restricted (to 2 digits) in more than one place. This then meant that the output would generally look okay except for the reported RRD value. As such I aborted that patch.
John.
-- John Horne | Senior Operations Analyst | Technology and Information Services University of Plymouth | Drake Circus | Plymouth | Devon | PL4 8AA | UK
[http://www.plymouth.ac.uk/images/email_footer.gif]<http://www.plymouth.ac.uk/worldclass>
This email and any files with it are confidential and intended solely for the use of the recipient to whom it is addressed. If you are not the intended recipient then copying, distribution or other use of the information contained is strictly prohibited and you should not rely on it. If you have received this email in error please let the sender know immediately and delete it from your system(s). Internet emails are not necessarily secure. While we take every care, University of Plymouth accepts no responsibility for viruses and it is your responsibility to scan emails and their attachments. University of Plymouth does not accept responsibility for any changes made after it was sent. Nothing in this email or its attachments constitutes an order for goods or services unless accompanied by an official order form.
participants (2)
-
cleaver@terabithia.org
-
john.horne@plymouth.ac.uk