"ports" RRD graph is often flaky.
I see several graphs like this in xymon (4.3.23), where at least one of the port graphs (configured in analysis.cfg) has lots of data points missing:
https://www.dropbox.com/s/l483lilzh5n610k/flaky-xymon-port-graph.png?dl=0
Any idea how I can fix this? As you can see, the red line for http has data for the entire graph, while the blue line for ajp is very spotty. I just added the last two monitors less than an hour ago, so I don't have history on those yet.
This is not unique to this host. As far as I can tell, the port listings coming back from the client are always good, even when the graph has no line.
Thanks, Shawn
On 3/22/2016 4:46 PM, Shawn Heisey wrote:
I see several graphs like this in xymon (4.3.23), where at least one of the port graphs (configured in analysis.cfg) has lots of data points missing:
https://www.dropbox.com/s/l483lilzh5n610k/flaky-xymon-port-graph.png?dl=0
Replying to myself (sure sign of insanity):
I may have actually figured out why this happened, looking for confirmation.
Here's part of the analysis.cfg config for this host:
PORT "LOCAL=%([.:]80)$" state=ESTABLISHED min=0 color=red track=http "TEXT=Web connections" PORT "LOCAL=%([.:]8009)$" state=ESTABLISHED min=0 color=red track=ajp "TEXT=AJP connections" PORT "LOCAL=%([.:]8777)$" state=ESTABLISHED min=0 color=red track=mule_aps "TEXT=MuleAPS connections" PORT "LOCAL=%([.:]8443)$" state=ESTABLISHED min=0 color=red track=mule_nwsi "TEXT=MuleNWSI connections"
All four of these lines have been in the config for a long time, but the mule_aps and mule_nwsi tracking identifiers *used* to be ajp, the same as the 8009 port line. So I had three lines all using "ajp" as the tracking ID.
I'm guessing that what I used to have is a misconfiguration, and I'm lucky that I got *anything* at all. I'm hoping that now I will have no problems.
Thanks, Shawn
On Wed, Mar 23, 2016 at 8:17 PM Shawn Heisey <haproxy at elyograg.org> wrote:
All four of these lines have been in the config for a long time, but the mule_aps and mule_nwsi tracking identifiers *used* to be ajp, the same as the 8009 port line. So I had three lines all using "ajp" as the tracking ID.
I think this explains it. RRD only permits one update per (by default) 5-minute interval, and any subsequent ones are rejected. If you had 3 updates going on every 5 minutes, only the first would have been accepted. So if mule_aps or mule_nwsi were both reporting NaN then it's possible that only 1/3 or all updates had valid data. Your graph seems to show about 1/3 present, and 2/3 missing.
J
On 3/23/2016 3:29 AM, Jeremy Laidman wrote:
On Wed, Mar 23, 2016 at 8:17 PM Shawn Heisey <haproxy at elyograg.org <mailto:haproxy at elyograg.org>> wrote:
All four of these lines have been in the config for a long time, but the mule_aps and mule_nwsi tracking identifiers *used* to be ajp, the same as the 8009 port line. So I had three lines all using "ajp" as the tracking ID.I think this explains it. RRD only permits one update per (by default) 5-minute interval, and any subsequent ones are rejected. If you had 3 updates going on every 5 minutes, only the first would have been accepted. So if mule_aps or mule_nwsi were both reporting NaN then it's possible that only 1/3 or all updates had valid data. Your graph seems to show about 1/3 present, and 2/3 missing.
Confirmed. Thank you to everyone who responded with ideas.
Since I changed the config so each tracking ID is only used once per config stanza, the graphing has been superb. I wasn't the one who created these configs. I just happened to be creating some additional port monitoring and changing the TEXT attribute on the existing "listening" config to include the port number. Because I was already making changes to a lot of PORT config lines, I decided to revamp them all to more accurately reflect what we were monitoring. I didn't even realize I was fixing a problem. :)
Thanks, Shawn
On Tue, March 22, 2016 3:46 pm, Shawn Heisey wrote:
I see several graphs like this in xymon (4.3.23), where at least one of the port graphs (configured in analysis.cfg) has lots of data points missing:
https://www.dropbox.com/s/l483lilzh5n610k/flaky-xymon-port-graph.png?dl=0
Any idea how I can fix this? As you can see, the red line for http has data for the entire graph, while the blue line for ajp is very spotty. I just added the last two monitors less than an hour ago, so I don't have history on those yet.
This is not unique to this host. As far as I can tell, the port listings coming back from the client are always good, even when the graph has no line.
Thanks, Shawn
This is interesting. When you zoom in on an area without data, are you getting NaN or 0's? I might suspect something is aborting processing here, but it's probably safe to rule out a message transmission issue, since both the client log this is coming from and the resulting ports 'data' message are unified blocks.
Q's:
- Are there any errors in xymond_client's or xymond_rrd's logs around these times?
- Is there a chance something else might be reporting netstat info on this hostname?
- Have there been any modifications to the client-side script for this server?
- Are you seeing the behavior from other clients? (Just AJP)
- Do the 'trends' graphs for this client show any other issue occurring concurrently with the gaps? (Missing vmstat data, for example, or process counts)
Finally, can you include your PORTS sections from analysis.cfg for this server?
-jc
Shawn On Wed, Mar 23, 2016 at 9:47 AM Shawn Heisey <hobbit at elyograg.org> wrote:
I see several graphs like this in xymon (4.3.23), where at least one of the port graphs (configured in analysis.cfg) has lots of data points missing:
https://www.dropbox.com/s/l483lilzh5n610k/flaky-xymon-port-graph.png?dl=0
Any idea how I can fix this? As you can see, the red line for http has data for the entire graph, while the blue line for ajp is very spotty. I just added the last two monitors less than an hour ago, so I don't have history on those yet.
This is not unique to this host. As far as I can tell, the port listings coming back from the client are always good, even when the graph has no line.
Thanks, Shawn
Firstly, I'm assuming you have setup port tracking with "PORT bla bla STATE=ESTABLISHED TRACK=ajp" or similar in analysis.cfg. It might be helpful to provide the relevant config section from there. I suppose there are three likely points of failure in the process of getting port data into the graphs. The first one is where the port data is received by Xymon and passed to the RRD parser xymond_rrd. The second is where the port lists are analysed to get the numbers. The third is where the numbers are added to the RRD file. If your "ports" tests show "red" or "clear" then that would suggest the first part is failing. If there are problems with the second or third parts, you may see useful messages logged in the rrd-status.log file. You could check to see if the port lists are being sent to the xymond_rrd process that handles status messages (and writes to the rrd-status.log file). If they are, then you'll know there's probably something amiss with the RRD analysis or update systems. Otherwise, if there are gaps in the messages being sent to the status channel, then it indicates that the RRD updating is probably fine. You can take a feed of the port status messages coming out of the status channel, so that you see everything that gets sent to xymond_rrd. First switch to the xymon user, and then run xymoncmd to setup your environment. Then run a command such as the following: $ xymond_channel --channel=status --filter='\|fremont\|ports\|' cat This will only display status channel messages for the host "fremont" and for the test name "ports". I'm using "cat" as the worker process, but you can substitute this with a grep, awk or sed command to reduce screen output as desired, or even write a script to parse the output. If you only want to know that a status message has been sent (and don't care about its contents), you could replace "cat" with "grep '^@@'", but for a first pass, "cat" works just fine. You can extend this idea to see what xymond_rrd is doing when it receives these messages, by running your own copy of xymond_rrd, with debugging enabled. Something like so: $ mkdir /tmp/myrrd $ xymond_channel --channel=status --filter='\|fremont\|ports\|' xymond_rrd --rrddir=/tmp/myrrd/ --debug This produces lots of output as xymond_rrd loads its configuration when it receives the first message, but then the output rate drops off to a few lines for each message. Due to buffering of the RRD file updates, it's tricky to debug whether xymond_rrd is sending updates to the RRD files. Cheers Jeremy
participants (4)
-
cleaver@terabithia.org
-
haproxy@elyograg.org
-
hobbit@elyograg.org
-
jlaidman@rebel-it.com.au