[hobbit] Logfile monitoring - I'd like some comments
Hi all
Henrik, I have given this a bit of thought, and think it's great. (I refer to your proposal here, not my capacity for thought.)
Would it be possible to add custom strings and status?
A perfect example would be this. (from /var/adm/messages) ---snip--- Feb 10 13:31:15 afgdev tldd[649]: [ID 138416 daemon.error] TLD(0) drive 2 (device 1) is being DOWNED, status: Unable to open drive ---snip--- Anywhere else, this would not be a major issue, but on my backup server where my tape library is attached, this is a major red alert.
Regards Vernon
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: Wednesday, 15 February 2006 5:40 AM To: hobbit at hswn.dk Subject: [hobbit] Logfile monitoring - I'd like some comments
A few days ago, I mentioned that I would like to do logfile monitoring for the next Hobbit release.
I've worked a bit on this and have a prototype solution for it, which you can test with the current snapshots. I'd like some comments on how it works to make sure I haven't overlooked something before committing myself.
There are several objectives:
- As far as is possible, logfile monitoring must be configured centrally, on the Hobbit server. Having to go to each server to (re)configure what logfiles to check and what to look for simply doesn't work.
- The amount of data sent from each client to Hobbit should be small, but it must catch the "important" stuff.
- You rarely know in advance what will be in the logs when you need them the most. So the monitor should give you as much of the log entries as possible, not just those lines that match some pre-defined strings or regex'es.
- Some systems log messages on multiple lines. The system must be able to show all parts of a log entry.
- Logfile entries must appear on the monitor for some time after they show up in the logs, but should also disappear after a while.
In other words: The ideal solution would let you have the entire logfile available on the Hobbit server - but that obviously won't work. So the client should - after weeding out the really irrelevant stuff - send us as much of each logfile as possible.
My proposed solution is this:
- On the Hobbit server, there's a log-monitoring configuration file for the Hobbit clients. This defines which logfiles are monitored for a single client installation, or you can define it for a group of clients. (The idea is to define at least one group for each operating system, since the standard system logs are OS dependant). This configuration lists the log filename, the maximum amount of data to send from this logfile, a regex "noise" filter (i.e. lines that are stripped from the logfile), and *optionally* a regex identifying really interesting stuff in the logfile that should always be reported.
- When a client connects to the Hobbit server and sends the normal client message, the Hobbit server will respond with the logfile configuration for this client. So the client has a copy of the central configuration file, but only the part that it needs for itself. The reason for sending this as a response to the client message is to avoid an extra round-trip from client to server; piggy-backing the config push on the client message means that it is almost without any performance cost on the server side.
- When the client runs, it uses the local copy of the configuration file to determine what logs to look at. For each logfile, it maintains a "where-was-I-the-last-time" status, so it only looks at the entries made to the logfile during the past 30 minutes. First, the client strips off any "noise" messages. Then, if all of the entries fit into the maximum size that can be reported, it sends all of the log to the Hobbit server. If there is more than will fit, it first checks to see of the regex defining the really interesting stuff is present in the log - if it is, then it drops anything before the interesting text. If there is still more than will fit, it keeps the interesting text + a few lines after that (to allow for multi-line log-entries which some OS'es have), and then sends that together with as much of last part the log as will fit inside the max. message size.
This part has been implemented in the Hobbit daemon (hobbitd), and in the clients via a new "logfetch" utility. This utility uses standard regular expressions - not the Perl-compatible ones, because that would require you to install the PCRE library on all of your clients. The standard regex routines are included in all (I think) system libraries used today.
The last part is what happens when the log data arrives on the Hobbit server. Currently, there's a simple processing of this data to just dump it into an always-green "msgs" column. What should happen once I get it coded is:
- Data from each logfile is matched against a set of strings (regex'es) defined in the hobbit-clients.cfg file. Each string determines the color (red, yellow, green) and sets the color of the msgs column accordingly.
When the color has been decided, all of the normal alerting happens automatically. I do plan on making a more fine-grained alert mechanism (for the msgs, procs and disk statuses) so you can direct alerts to different groups depending on exactly which log-message triggered the alert, but that will not be part of this release.
So - how does that sound ? Anything I've missed ?
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
NOTICE: This message and any attachments are confidential and may contain copyright material of Australian Finance Group Limited or a third party. It is intended solely for the purpose of the addressee and any other named recipient. If you are not the intended recipient, any use, distribution, disclosure or copying of this message is strictly prohibited. The confidentiality attached to this message is not waived or lost by reason of the mistaken transmission or delivery to any unintended party. If you have received this message in error, please notify the author immediately or contact Australian Finance Group on +61 8 9420 7888.
On Wed, Feb 15, 2006 at 01:59:22PM +0800, Vernon Everett wrote:
Henrik, I have given this a bit of thought, and think it's great. (I refer to your proposal here, not my capacity for thought.)
Would it be possible to add custom strings and status?
A perfect example would be this. (from /var/adm/messages) ---snip--- Feb 10 13:31:15 afgdev tldd[649]: [ID 138416 daemon.error] TLD(0) drive 2 (device 1) is being DOWNED, status: Unable to open drive ---snip--- Anywhere else, this would not be a major issue, but on my backup server where my tape library is attached, this is a major red alert.
Of course you can do that.
You should note that the config that gets pushed to the client is merely a list of logfiles, and a rough filter to avoid sending all of the log to the Hobbit server.
You still configure what messages trigger an alert in the hobbit-client.cfg file on the Hobbit server - and in your case, you would just set things up to trigger a critical alert when this message shows up in your backup server logfile.
Regards, Henrik
Henrik,
How will it handle monitoring files that get rotated out? For example if the hobbit client is monitoring /var/log/messages, and a cron rotate script moves messages to messages.1 and gzips it, will the hobbit client be smart enough to reseek to the end of the newly created file?
Some log rotation setups move/rename the file to another (which keeps the inode), and then recreate a new file with the same name as the origional., Some copy the file to a new file and truncate the old one, and other variations.
*** Partially off-topic *** While looking at another groups monitoring setup, they were using a program called ****** (name doesnt matter), which I found to be inferior to Hobbit, but it did have one nice feature, which was the ability to test the checksum of a list of files, and send an alert if the file changed (default examples were /etc/passwd, /vmlinuz, /etc/syslog.conf). I suppose this functionality could be achieved via a client-side external script, but I mention it here because it might be easy to add in now while you are working on the file scanning code :)
-Charles
On Tue, Feb 14, 2006 at 11:43:20PM -0700, Charles Jones wrote:
How will it handle monitoring files that get rotated out? For example if the hobbit client is monitoring /var/log/messages, and a cron rotate script moves messages to messages.1 and gzips it, will the hobbit client be smart enough to reseek to the end of the newly created file?
Log rotation is difficult to handle - I just wrote about it in another reply. In the scenario you describe, Hobbit would miss those log messages that were made between the last client run and the log rotation - so normally, that would only be log-entries for a few minutes (since the client runs every 5 minutes).
Hobbit does notice that the log was rotated, and starts sending the entries that go into the new logfile.
*** Partially off-topic *** While looking at another groups monitoring setup, they were using a program called ****** (name doesnt matter), which I found to be inferior to Hobbit, but it did have one nice feature, which was the ability to test the checksum of a list of files, and send an alert if the file changed (default examples were /etc/passwd, /vmlinuz, /etc/syslog.conf). I suppose this functionality could be achieved via a client-side external script, but I mention it here because it might be easy to add in now while you are working on the file scanning code :)
I think this is better handled by some of the host-based IDS systems that are out there - like Tripwire, or the open-source equivalent AIDE. That's what they are designed to do, and they have much more advanced techniques of checking that the file contents doesn't change (multiple hashes, checking of file meta-data etc.)
Regards, Henrik
participants (3)
-
henrik@hswn.dk
-
jonescr@cisco.com
-
v.everett@afgonline.com.au