Den 15-03-2011 15:24, Glenn Attwood skrev:
(gdb) bt #0 0x004c3422 in __kernel_vsyscall () #1 0x00a7a651 in raise () from /lib/tls/i686/cmov/libc.so.6 #2 0x00a7da82 in abort () from /lib/tls/i686/cmov/libc.so.6 #3 0x08072023 in ?? () #4 <signal handler called> #5 0x0806f888 in ?? () #6 0x0805d82c in ?? () #7 0x00a66bd6 in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6 #8 0x08049d71 in ?? () (gdb)
unfortunately it seems your binary was built without the debugging symbols included (probably "stripped" as it's called). That makes it rather difficult to guess what is happening, since I don't know what code is located at address 0x0806f888 ...
Did you compile it yourself or use a pre-compiled binary ? If you could compile it with debugging symbols included that would be much more helpful. With the gcc compiler, make sure the CFLAGS in your Makefile include the "-g" option (it does by default in the Xymon source archive).
Regards, Henrik
unfortunately it seems your binary was built without the debugging symbols included (probably "stripped" as it's called). That makes it rather difficult to guess what is happening, since I don't know what code is located at address 0x0806f888 ...
Did you compile it yourself or use a pre-compiled binary ? If you could compile it with debugging symbols included that would be much more helpful. With the gcc compiler, make sure the CFLAGS in your Makefile include the "-g" option (it does by default in the Xymon source archive).
Sorry, I should have caught that. I built debs from the debian/rules included in the release. I commented out the strip call, and here is the new backtrace:
Core was generated by `xymond_client'. Program terminated with signal 6, Aborted. #0 0x0030a422 in __kernel_vsyscall () (gdb) bt #0 0x0030a422 in __kernel_vsyscall () #1 0x0013a651 in raise () from /lib/tls/i686/cmov/libc.so.6 #2 0x0013da82 in abort () from /lib/tls/i686/cmov/libc.so.6 #3 0x08072023 in sigsegv_handler (signum=11) at sig.c:57 #4 <signal handler called> #5 0x0806f888 in get_ostype (osname=0x807710c "") at misc.c:44 #6 0x0805d82c in main (argc=1, argv=0xbfae9ba4) at xymond_client.c:2166
This is running on Ubuntu/Lucid 32bit if it matters.
Thanks, Glenn
Core was generated by `xymond_client'. Program terminated with signal 6, Aborted. [snip] #4 <signal handler called> #5 0x0806f888 in get_ostype (osname=0x807710c "") at misc.c:44 #6 0x0805d82c in main (argc=1, argv=0xbfae9ba4) at xymond_client.c:2166
Ok, the problem here is a host sending some "client" data without a valid hostname or operating-system identifier. A client message should begin like
client jorn,hswn,dk.linux linux
where the first "linux" is the name of the OS of the host. This client message that causes the crash doesn't have that. In fact, I don't think it has even the hostname.
xymond_client shouldn't crash, of course.
It would be nice to know what kind of client is triggering this. If you go back into "gdb", then run these three commands:
fr 6
p hostname
p sender
That should give you the hostname this client reports (or NULL, if it didn't report one - I suspect that is the case here), and the IP-address it was sent from.
Regards, Henrik
On 03/16/2011 05:18 PM, Henrik Størner wrote:
It would be nice to know what kind of client is triggering this. If you go back into "gdb", then run these three commands:
fr 6 p hostname p sender
#0 0x00d21422 in __kernel_vsyscall () (gdb) fr 6 #6 0x0805d82c in main (argc=1, argv=0xbfc5e484) at xymond_client.c:2166 2166 os = get_ostype(clientos); (gdb) p hostname $1 = <value optimized out> (gdb) p sender $2 = 0xb76f8817 "142.150.160.218"
Should I rebuild with -O0 as opposed to -O2?
Ok, the problem here is a host sending some "client" data without a valid hostname or operating-system identifier. A client message should begin like
client jorn,hswn,dk.linux linux
Well, the entry in /var/lib/xymon/hostdata looks ok from that POV (here are the first few lines):
[collector:] client devonian,utsc,utoronto,ca.sunos sunos [date] Wed Mar 16 21:25:43 EDT 2011 [uname] SunOS devonian 5.10 Generic_138889-02 i86pc i386 i86pc
Thanks,
Glenn Attwood Senior Network Administrator, IITS University of Toronto Scarborough 416-287-7364
On 03/17/2011 09:39 AM, Glenn Attwood wrote:
On 03/16/2011 05:18 PM, Henrik Størner wrote:
It would be nice to know what kind of client is triggering this. If you go back into "gdb", then run these three commands:
fr 6 p hostname p sender
#0 0x00d21422 in __kernel_vsyscall () (gdb) fr 6 #6 0x0805d82c in main (argc=1, argv=0xbfc5e484) at xymond_client.c:2166 2166 os = get_ostype(clientos); (gdb) p hostname $1 = <value optimized out> (gdb) p sender $2 = 0xb76f8817 "142.150.160.218"
when compiled with -O0, I get the following:
(gdb) p hostname $1 = 0xb77be794 "permian.utsc.utoronto.ca" (gdb) p sender $2 = 0xb77be784 "142.150.160.219"
so it looks like at least 2 different machines are causing this, both are solaris10/x86, running hobbit-client 4.3.0-beta2
-- Glenn Attwood Senior Network Administrator, IITS University of Toronto Scarborough 416-287-7364
Hi Glenn,
On Thu, 17 Mar 2011 15:50:02 -0400, Glenn Attwood <attwood at utsc.utoronto.ca> wrote:
(gdb) p hostname $1 = 0xb77be794 "permian.utsc.utoronto.ca" (gdb) p sender $2 = 0xb77be784 "142.150.160.219"
so it looks like at least 2 different machines are causing this, both are solaris10/x86, running hobbit-client 4.3.0-beta2
sorry it's taken some time to get back to you about this.
Could you look at the client/tmp/msg.permian.utsc.utoronto.ca.txt file over on the client and see what it has in the first line ? I am curious why these hosts don't report a valid system identifier.
To avoid the crash, you can grab the latest lib/misc.c file from http://xymon.svn.sourceforge.net/viewvc/xymon/trunk/lib/misc.c?view=log (the 6661 revision) and drop it into your Xymon 4.3.0 sourcetree. That should stop it crashing.
Regards, Henrik
No worries about any delay, you help is appreciated and I'm sure you're busy with many other things as well.
First line is:
client permian,utsc,utoronto,ca.
I'll apply the fix from svn and see if that stops the problem. I was also going to upgrade to xymon-client 4.3.0 to see if that fixes things as well. My guess is that some combination of sed/grep/awk is not working and the system identifier is not being correctly parsed.
Glenn Attwood Senior Network Administrator, IITS University of Toronto Scarborough 416-287-7364
On 03/28/2011 08:36 AM, henrik at hswn.dk wrote:
Hi Glenn,
On Thu, 17 Mar 2011 15:50:02 -0400, Glenn Attwood <attwood at utsc.utoronto.ca> wrote:
(gdb) p hostname $1 = 0xb77be794 "permian.utsc.utoronto.ca" (gdb) p sender $2 = 0xb77be784 "142.150.160.219"
so it looks like at least 2 different machines are causing this, both are solaris10/x86, running hobbit-client 4.3.0-beta2
sorry it's taken some time to get back to you about this.
Could you look at the client/tmp/msg.permian.utsc.utoronto.ca.txt file over on the client and see what it has in the first line ? I am curious why these hosts don't report a valid system identifier.
To avoid the crash, you can grab the latest lib/misc.c file from http://xymon.svn.sourceforge.net/viewvc/xymon/trunk/lib/misc.c?view=log (the 6661 revision) and drop it into your Xymon 4.3.0 sourcetree. That should stop it crashing.
Regards, Henrik
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
First line is:
client permian,utsc,utoronto,ca.
Aha, that's wrong. There's no operating-system name included there, it should be right after the hostname. Like (on my Linux box "jorn.hswn.dk"): client jorn,hswn,dk.linux linux
This is usually picked up from the SERVEROSTYPE which gets defined by the "runclient.sh" script that starts the client. It uses this command:
SERVEROSTYPE="uname -s | \ tr '[ABCDEFGHIJKLMNOPQRSTUVWXYZ/]' '[abcdefghijklmnopqrstuvwxyz_]'"
but you can override it with the "--os" option when starting the client.
Regards, Henrik
participants (2)
-
attwood@utsc.utoronto.ca
-
henrik@hswn.dk