Whoops, forgot to CC this to the list. I hate it when that happens. So in case it helps someone else, my off-list email is below.
And just for the record, I still reckon Smokeping is the go. I see no reason why it wouldn't detect a lot of transient errors, more likely if you adjust the parameters for step and ping count. There's no way of guaranteeing that you would see a transient error, unless you happen to be sending every packet received by the device you're testing! Instead you should be monitoring the error rates on the switch port and the device NIC, probably using SNMP.
====
Adam
On Thursday, 5 June 2014, Adam Goryachev < mailinglists at websitemanagers.com.au> wrote:
Specifically, I now want to record at least the following data into RRD's for later viewing:
- Maximum ping time per minute
- Average ping time per minute
- Minimum ping time per minute
- Packet loss per minute
Seems to me that this is exactly what Smokeping can provide for you. Have a look at the demo site, drill down to a single device, and have a look at the graphs. eg: http://oss.oetiker.ch/smokeping-demo/?target=Customers.OP.octopus
One thing that does happen is obviously drift, ie, the processing time of
my script will take a fraction of a second, so I won't really get a value for every single second
One way to overcome this is to run the probe in background, so that it doesn't really matter how long it takes (as long as you're not accumulating processes over time). Like this:
#!/bin/sh
doping() {
...
}
while true; do
SECONDS=date +%s # in case not bash
sleep expr 60 - $SECONDS % 60
doping >> /var/log/pingmon.log &
done
This runs the subroutine do_ping in the background, but first waits how ever long it needs until the clock ticks over for the next minute. You would always have the subroutine run at the start of the minute.
Cheers Jeremy