Kris, making more progress, although it must be frustrating. Sometimes it's just one little thing, but it takes a lot of steps to find out what it is. You might be on the cusp!Definitely use darwin as OS type. It would be possible to use powershell, but it will be a lot more effort overall. The Xymon parser has special code for the powershell OS type which means that you'll have to re-formulate a bunch of native (to MacOS) command output into a different non-native format. Ideally, the OS type should match the output of `uname -o`, because that will make the your customised client module drop right into the Xymon client bin directory, and be automatically picked up.So what we have now is a client message that appears to be well-formed, containing the [uptime] section that is also well-formed. We assume that the client message is being received by xymond, and we hope that it's being turned into a "client channel" message and given to xymond_client for parsing. But let's make sure we know how far the message goes, each step of the way.Note: none of these commands would be expected to interfere at all with the running of Xymon or the server on which it runs, but if it's not considered safe to run them, you might want to run up a test Xymon server of your own - you could probably run Xymon on your MacBook, natively after compiling from source, or in a VM. But let's assume you have a Xymon server, somewhere, at your disposal...On your Xymon server, run the following command:sudo -u xymon xymoncmd xymond_channel --channel=client --filter=darwin grep ^@That command will sit there, waiting for your client data messages, or until you kill it (eg press ^C). Leave this on the side, and open a second terminal on your Xymon server, to do other stuff that should generate output here.Copy your trimmed-down client message, as you provided in your previous email (starting with [collector:] and ending with the uptime section), into a file. Let's call the file "client-message". Note that the [collector] section is not required, and I actually suspect it is prepended by the Xymon server after receiving the client message. The Xymon client scripts don't seem to have any mention of the string "collector". My recommendation is to leave these out.Now, while watching the output run this command, which emulates a Xymon client:{ echo "client SysAdmins-MacBook-Pro.local.darwin"; cat client-message; } | xymon 127.1 @You should see something like this on your other terminal:@@client#204631/SysAdmins-MacBook-Pro.local|1722467074.795740|127.0.0.1|SysAdmins-MacBook-Pro.local|darwin|darwin|
@@If you do, this shows that the client message was given to the client parser for parsing. If you don't then there's something wrong.What I would do now is to try running your actual client script, and see if you get the same result, with similar output.You could re-run the "xymond_channel" command using "cat" instead of "grep" if you want to see the whole message. Client messages are usually huge, so it's often beneficial to use "grep" to just view the header, or whatever you're interested in. But in this case, the client messages are going to be small, so using "cat" will probably show the whole message in less than a screenful.So what should be happening next is that xymond_client also sees the "@@..." messages, and parses them for [uptime] and creates a CPU status message. We can check that this happens with a similar xymond_channel command, this time tapping into the status channel:sudo -u xymon xymoncmd xymond_channel --channel=status --filter=darwin grep ^@Now, when you inject the client message again (either using the { echo bla bla } command above, or running your client script), you should now see one or more status messages with "@@", and with test name field of "cpu". You might also see "msgs" and "files", although this might depend on whether you have matching rules in analysis.cfg for these.Also, you should have one or more files in the Xymon hist dir (on my system it's /var/lib/xymon/hist/) which match the hostname, so something like SysAdmins-MacBook-Pro,local and SysAdmins-MacBook-Pro,local.cpu. If present, check that these are recent - if you're not sure, you could use "xymon 127.1 'drop hostname'" to clear everything out, and re-test.If you get this far, you've gone a long way, and we can progress to the next stage. If you don't then we're narrowing down where the problem might be, and we can dig a little deeper (eg looking at the logs files xymond.log and clientdata.log.CheersJeremyOn Thu, 1 Aug 2024 at 01:06, Kris Springer <kspringer@innovateteam.com> wrote:Ok, I've stripped my script down to the bare minimum functions to get this figured out. All it does now is collect date and uptime and sends that to the server. I deleted the host's history and logs from the server to start fresh, and this is the payload that the server is getting now. The spacing all looks fine to me.
[collector:]
client SysAdmins-MacBook-Pro.local.darwin darwin
[clientversion]
3.06
[date]
07/31/2024 08:40:36
[uptime]
8:40 up 22 days, 20:21, 3 users, load averages: 0.68 0.43 0.33
The cpu webpage does not get generated at all with 'powershell' as the ostype, but ostype 'darwin' does make the cpu webpage show up, but no graph appears, and the text content displayed on the cpu page looks different than it does for other functioning hosts. So I think we've confirmed that 'darwin' is one required piece of the puzzle.
Also I've noticed that a Windows host's powershell clientlog shows the uptime info labeled under the [cpu] section. Should I be formatting my MacOS payload to display the uptime info under the [cpu] section instead of [uptime]? I did test it but it didn't fix anything.
[cpu]
up: 1 days, 1 users, 146 procs, load=0.09%
Here's what the MacOS 'darwin' host cpu page displays. The uptime info is not on the same line as the date like other hosts. Do you think that has anything to do with the graph not displaying?
and here's what a Windows 'powershell' host cpu page displays, along with a graph at the bottom of the page.
I also tried formatting the date to match the Windows host, but that didn't fix anything.
I stripped out any [cpu] content from my payload for this testing, which previously would have populated the cpu webpage with the total cores info. So my script is not currently pushing any [cpu] content, only [uptime]. I hear what you're saying about enabling debugging and really digging into what's going on here, but my Server is a production box that I just can't do that with. Also, based on your detailed explanations of everything, this should be working.
Kris Springer
On 7/30/24 22:52, Jeremy Laidman wrote:
OK, that string looks good to me. Although due to (presumably) HTML tags, it showed up as white on white in my email client (Gmail), so I'll reproduce it here, with formatting removed:
[uptime]
20:36 up 22 days, 8:17, 6 users, load averages: 2.12 2.12 1.61
For comparison, [uptime] from one of my Linux servers:
[uptime]12:54pm up 178 days 20:49, 0 users, load average: 1.20, 1.44, 1.52
Your string has what xymond_client.c is looking for:* the "[uptime]" section header* the string "<space>up<space>", required to calculate uptime and include "up N days" in the CPU status* the string "load average:<space>" or "load averages:<space>" followed by either "<float> <float> <float>" or "<float>,<space><float>,<space><float>"
The uptime calculation is not related to graphing the CPU load average. However, that section of code has a debug line, so if you enable debugging, you might be able to use this to confirm that the uptime code is being run. If that code is being run, then the [uptime] section is making its way to the parser. The debug message is "CPU check host <hostname>: <uptime> days". There's another similar line showing hours of uptime. It might not be practical for you to run xymond_client with "--debug", depending on your environment. But it might be instructive if you can.
The fact that Xymon is sending a status "cpu" message for you is a good sign. But at that point, any thresholding that is going to happen, will have happened. So if you were to set LOAD to a very low level in analysis.cfg, you should be able to make the status yellow or red.
Then onto the graphing. The RRD file is populated from the "load=N" string in the CPU status message header. The CPU status page should have a first line that looks like:
"<timestamp> up: <uptime string>, <usercount> users, <proccount> procs, load=<loadavg>s"
From one of my servers:
"Wed Jul 31 13:02:34 EST 2024 up: 178 days, 0 users, 225 procs, load=1.41"
The parser (within xymond_rrd) that grabs the uptime number for graphing looks at the "load=" string at the end of this line. But it only use a line that contains "up:<space>" or "uptime:" or "Uptime:". The load string (in this case 1.41) can be "NN%" or "NN.NN" or "NN".
It might be helpful to capture a raw CPU status message using "xymond_channel --channel=status", to see if the structure looks right, or if there are any spurious characters.
A couple of other things:* it can take a few samples before an RRD file gets any numbers, so give it maybe 15 minutes before you conclude it's not working* the RRD routines might be logging an error in rrd-status.log* check that the RRD file doesn't already exist; remove it if it does (it might have incompatible structure)* check that the xymon user has permissions to write to the directory where the RRD file is located
J
On Wed, 31 Jul 2024 at 12:39, Kris Springer <kspringer@innovateteam.com> wrote:
Previously I had both a [cpu] section and an [uptime] section in my client script. I removed all references to [cpu] from my script so that only [uptime] would get sent to the server. I tried 'darwin', 'powershell', and 'linux' as the OSTYPE, and the text content of the uptime data does get displayed on the cpu webpage, but no graph is generated. Depending on which OSTYPE I choose in my script it slightly changes the text that's displayed on the cpu webpage.
Here's what the uptime output looks like in the clientlog when using 'darwin' as the ostype.
[uptime] 20:36 up 22 days, 8:17, 6 users, load averages: 2.12 2.12 1.61I should note that the load average in that uptime output is totally wrong. I have a good block of code that gives the correct percentage output, but for now I'm just trying to figure out the secret to getting the la graph to generate.
Kris Springer
On 7/30/24 19:23, Jeremy Laidman wrote:
Kris
Glad you're making progress.
What have you been using for OSTYPE that works for everything but CPU?
Can you show an example of the section in the client message with CPU load information, which I think is likely to be [uptime]?
The Darwin client message parser (as many others do, including eg Linux and Solaris) essentially sends the "load average" string to the generic UNIX CPU parser. I'd assume the load average string is in [uptime] the same as for Linux. But whatever shows "load average:" or "load averages:" would work. If the UNIX parser sees either of these strings, followed by three numbers, it will pick out the middle (5 minute) number and use that for thresholding and constructing a status message. Then the status parser for "cpu" will take the value to use in graphing (ie updating the la.rrd file).
J
On Wed, 31 Jul 2024 at 10:28, Kris Springer <kspringer@innovateteam.com> wrote:
Thanks Zak. I finally got back to working on my Mac client and have made significant progress now that I'm sending things to the server in the correct format. I am now hitting an issue that I was also running into previously regarding the server not generating a cpu (la) graph. It seems that everything else works fine with my client, but no matter what I do I can't get the server to generate a cpu graph when sending the data as_______________________________________________
client $hostname.$ostype $clientclass
The cpu graph works perfectly if I send data as
status $hostname.cpu green\n$cpuData
But of course I can't do that because it forces the color green and then I can't use the analysis.cfg tolerances on it.
If I send it with no color it doesn't work at all.
status $hostname.cpu $cpuData
I've looked all through the original XymonPSclient for Windows that I'm using as a functioning example, but I'm not seeing the magic sauce that makes the cpu graph work. All the other tests graphs work just fine, but this cpu graph has given me trouble since I first started this project. Any ideas?
I've tried the following $ostype $clientclass options with none seeming to change anything.
linux
freebsd
netbsd
openbsd
darwin
bbwin
powershell
generic
Kris Springer
On 7/10/24 03:42, Beck, Zak wrote:
Hi Kris
Jeremy is right – for server side analysis, don’t send multiple individual status messages for these core tests. I don’t think analysis is triggered for status messages.
Instead, send one ‘client’ message with [cpu], [disk], [memory] sections, and send it as a client message, e.g. this from the Windows Powershell client:
client $($clientname).$($script:XymonSettings.clientsoftware) $($script:XymonSettings.clientclass) XymonPS
The defaults here are ‘powershell’ for clientsoftware (which is really OSTYPE) and clientclass.
If you have a Windows host running the Powershell client, you can see the last client message sent to xymon in the xymon-lastcollect.txt file. This is verbatim what is sent to the server via TCP.
https://www.xymon.com/help/manpages/man1/xymon.1.html
client[/COLLECTORID] HOSTNAME.OSTYPE [HOSTCLASS]
Used to send a "client" message to the Xymon server. Client messages are generated by the Xymon client; when sent to the Xymon server they are matched against the rules in the analysis.cfg(5) configuration file, and status messages are generated for the client-side tests. The COLLECTORID is used when sending client-data that are additions to the standard client data. The data will be concatenated with the normal client data.
You need to find the right OSTYPE for MacOS… xymond/xymond_client.c in the server source is what handles these messages, you’ll see a bunch of includes to match different OS types:
#include "client/linux.c"
#include "client/freebsd.c"
#include "client/netbsd.c"
#include "client/openbsd.c"
#include "client/solaris.c"
#include "client/hpux.c"
#include "client/osf.c"
#include "client/aix.c"
#include "client/darwin.c"
#include "client/irix.c"
#include "client/sco_sv.c"
#include "client/bbwin.c"
#include "client/powershell.c" /* Must go after client/bbwin.c */
#include "client/zvm.c"
#include "client/zvse.c"
#include "client/zos.c"
#include "client/mqcollect.c"
#include "client/snmpcollect.c"
#include "client/generic.c"
These included files handle the different OSTYPEs and differences in message formats between each (e.g. the output of the df command has different headings for some OSes).
My guess is Darwin is something like OS X, you’re right that there’s nothing for modern MacOS. Maybe Darwin would work.
Alternatively, if you can make the output resemble what you’d get from a Unix client then you could use OSTYPE = one of the unix types (possibly one of the BSDs, as MacOS has BSD origins).
I‘d be tempted to start with darwin, and look in xymond/client/darwin.c - note that xymond_client is probably expecting to see some sections that you may not be sending yet:
timestr = getdata("date");
uptimestr = getdata("uptime");
clockstr = getdata("clock");
msgcachestr = getdata("msgcache");
whostr = getdata("who");
psstr = getdata("ps");
topstr = getdata("top");
dfstr = getdata("df");
inodestr = getdata("inode");
meminfostr = getdata("meminfo");
msgsstr = getdata("msgs");
netstatstr = getdata("netstat");
ifstatstr = getdata("ifstat");
portsstr = getdata("ports");
Cheers
Zak
From: Kris Springer <kspringer@innovateteam.com>
Sent: Wednesday, July 10, 2024 5:37 AM
To: Xymon mailinglist <xymon@xymon.com>; Jeremy Laidman <jeremy@laidman.org>
Subject: [External] [Xymon] Re: analysis tolerances not applying to scripted MacOS host
CAUTION: External email. Be cautious with links and attachments.
Understood, and thanks for the info. I thought it might be something like that. The problem is that there isn't a modern functioning MacOS xymon client that uses the xymond_client process. All the math is wrong in the 2015 version. The cpu, mem, and disk checks are all different in the modern MacOS versions than they were 10 years ago. For whatever reason Xymon has seemed to ignore that Mac's need monitored too. I've got a good script that produces good data, but it's not using any xymond_client process. I'd be happy to share my code with anyone who can assist with getting this working the 'correct' way. We could wrap it up into an App.dmg for easy install. Then we'll finally have a MacOS client available just like we already have the Windows PS client and the Linux clients.
---
Kris Springer
On July 9, 2024 10:01:28 PM Jeremy Laidman <jeremy@laidman.org> wrote:
The analysis.cfg thresholds are applied in central mode. If you are operating in local mode (which you appear to be, as you are sending status messages from the client) then the CPU thresholding needs to be done on the client.
In central mode, the xymond_client process parses client messages for relevant sections (eg [df] for disk data, [uptime] for CPU load averages) and performs thresholding checks against settings in analysis.cfg. It's my understanding that in local mode, the xymond_client process runs on the client side, so if you're implementing a full client that runs in local mode, you have to replicate the behaviour of xymond_client.
It's far simpler to make a client that runs only in central mode. You construct a client message consisting of all of the sections you're interested in, and send it to the server for it to parse for thresholding and extracting metrics for rrd files. The structure of the client message contents is important, as the parser expects the sections to be in a format that matches the OS ID of the client ("darwin" for MacOS?).
J
On Wed, 10 July 2024, 07:28 Kris Springer, <kspringer@innovateteam.com> wrote:
I've been working to create a new MacOS Xymon Client since the last
iteration I can find is using MacPorts and it's from 2015. I've tried
both Python and Powershell using Homebrew and I've settled on
Powershell. I may convert it to Python later once I get it finalized.
It's just a single script that gathers host data and sends it to the
server without any 2-way communication or client-local.cfg handshake
stuff going on. I have data being sent successfully from a Mac laptop
and the server is displaying everything fine with graphs and text
outputs. But the server isn't applying the analysis.cfg tolerances to
the host and I don't know why. Is there some magic piece of code that
I'm overlooking or don't know exists that tells the server to compare
the data to the tolerances in analysis.cfg?
My MacPSclient.ps1 is sending it's data in this format, but the status
is always green on the server.
"status $hostname.cpu green\n$cpuData"
This also works, but the status is always green.
"status $hostname.cpu green $timestamp $cpuData"
This also works, but the status is always green.
"status $hostname.disk green\n[disk]\n$diskUsageToSend"
I can change the word green to yellow and it changes the color on the
server, but if I remove the word green, no data is updated on the server
side when the script is ran. So I'm assuming the color tag is required.
I've attempted to read how-to's, looked in other client scripts for a
clue, and searched in forums, but I'm not finding anything.
Can someone here help?
Thanks so much!
--
Kris Springer
_______________________________________________
Xymon mailing list -- xymon@xymon.com
To unsubscribe send an email to xymon-leave@xymon.com_______________________________________________
Xymon mailing list -- xymon@xymon.com
To unsubscribe send an email to xymon-leave@xymon.com
This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security, AI-powered support capabilities, and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy.
______________________________________________________________________________________
www.accenture.com
Xymon mailing list -- xymon@xymon.com
To unsubscribe send an email to xymon-leave@xymon.com
_______________________________________________ Xymon mailing list -- xymon@xymon.com To unsubscribe send an email to xymon-leave@xymon.com