We're a kerberos based shop, which means everything authenticating to kerberos needs to be within 5 minutes of our kerberos server's clock. We recently had problems with a few important servers having severe time keeping problems, as well as two of our time servers not keeping sync because of multiprocessor issues. For a lot of our servers it'd be nice to warn if time is more than a minute or two out from our reference clock on the kerberos servers, and go red at four or five minutes out. But it'd also be nice to have a warning if our central time servers are having problems keeping sync, or having other problems (I'm not intimately familiar with ntp, so I'm not sure what problems those could be).
I think the ntp check right now only checks that ntp is running. It would be nice to check that there's a valid sys.peer, and maybe some number of acceptable candidates, maybe that the offset and jitter are within some reasonable parameters for the peers/servers.
I'm mostly just thinking about implementing something like this at the moment, maybe someone already has this, or maybe there's a better way to think about this, so I'd appreciate input on it.
Tracy J. Di Marco White Information Technology Services Iowa State University