Since my upgrade to 4.1.1, I've had a problem with the hobbitd_client crashing at least 3-4 times a day. The core files are generated in hobbit/server/tmp and the process is restarted. An alert is also sent under the test name 'hobbitd_client'. Here is the stack trace from the latest core file. Please note that the server name has been masked after the fact. An interesting side note is that it always seems to dump on the same client server. Note that the client is running the new Hobbit software.
Also, I recently made a change to increase the max message size to 800,000 bytes.
[root at sac-pmon-01 tmp]# gdb ../bin/hobbitd_client core.19313 GNU gdb Red Hat Linux (6.1post-1.20040607.41rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".
Core was generated by `hobbitd_client'. Program terminated with signal 6, Aborted. Reading symbols from /usr/local/lib/libpcre.so.0...done. Loaded symbols for /usr/local/lib/libpcre.so.0 Reading symbols from /lib64/tls/libc.so.6...done. Loaded symbols for /lib64/tls/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 #0 0x0000003b1a82e4dd in raise () from /lib64/tls/libc.so.6 (gdb) bt #0 0x0000003b1a82e4dd in raise () from /lib64/tls/libc.so.6 #1 0x0000003b1a82fc8e in abort () from /lib64/tls/libc.so.6 #2 0x000000000040c9a3 in sigsegv_handler (signum=19313) at sig.c:57 #3 <signal handler called> #4 0x0000003b1a86eab0 in strchr () from /lib64/tls/libc.so.6 #5 0x00000000004045bb in handle_solaris_client ( hostname=0x513a8c "wal-ddbs-01.x.x.x.com <http://wal-ddbs-01.x.x.x.com>", hinfo=0x6e5370, sender=0x3d <Address 0x3d out of bounds>, timestamp=4252624, clientdata=0x0) at solaris.c:62 #6 0x0000000000405079 in main (argc=5323443, argv=0x7fffffffd348) at hobbitd_client.c:807 (gdb)
[root at sac-pmon-01 tmp]# gdb ../bin/hobbitd_client core.11307 GNU gdb Red Hat Linux (6.1post-1.20040607.41rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".
Core was generated by `hobbitd_client'. Program terminated with signal 6, Aborted. Reading symbols from /usr/local/lib/libpcre.so.0...done. Loaded symbols for /usr/local/lib/libpcre.so.0 Reading symbols from /lib64/tls/libc.so.6...done. Loaded symbols for /lib64/tls/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 #0 0x0000003b1a82e4dd in raise () from /lib64/tls/libc.so.6 (gdb) bt #0 0x0000003b1a82e4dd in raise () from /lib64/tls/libc.so.6 #1 0x0000003b1a82fc8e in abort () from /lib64/tls/libc.so.6 #2 0x000000000040c9a3 in sigsegv_handler (signum=11307) at sig.c:57 #3 <signal handler called> #4 0x0000003b1a86eab0 in strchr () from /lib64/tls/libc.so.6 #5 0x00000000004045bb in handle_solaris_client ( hostname=0x513a8c "wal-ddbs-01.x.x.x.com <http://wal-ddbs-01.x.x.x.com>", hinfo=0x6d8f70, sender=0x3d <Address 0x3d out of bounds>, timestamp=0, clientdata=0x0) at solaris.c:62 #6 0x0000000000405079 in main (argc=5323443, argv=0x7fffffffd348) at hobbitd_client.c:807 (gdb)
[root at sac-pmon-01 tmp]# gdb ../bin/hobbitd_client core.10241 GNU gdb Red Hat Linux (6.1post-1.20040607.41rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".
Core was generated by `hobbitd_client'. Program terminated with signal 6, Aborted. Reading symbols from /usr/local/lib/libpcre.so.0...done. Loaded symbols for /usr/local/lib/libpcre.so.0 Reading symbols from /lib64/tls/libc.so.6...done. Loaded symbols for /lib64/tls/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 #0 0x0000003b1a82e4dd in raise () from /lib64/tls/libc.so.6 (gdb) bt #0 0x0000003b1a82e4dd in raise () from /lib64/tls/libc.so.6 #1 0x0000003b1a82fc8e in abort () from /lib64/tls/libc.so.6 #2 0x000000000040c9a3 in sigsegv_handler (signum=10241) at sig.c:57 #3 <signal handler called> #4 0x0000003b1a86eab0 in strchr () from /lib64/tls/libc.so.6 #5 0x00000000004045bb in handle_solaris_client ( hostname=0x513a8c "wal-ddbs-01.x.x.x.com <http://wal-ddbs-01.x.x.x.com>", hinfo=0x6e1b90, sender=0x3d <Address 0x3d out of bounds>, timestamp=-64, clientdata=0x0) at solaris.c:62 #6 0x0000000000405079 in main (argc=5323443, argv=0x7fffffffd348) at hobbitd_client.c:807 (gdb
Brian Lynch wrote:
Since my upgrade to 4.1.1, I've had a problem with the hobbitd_client crashing at least 3-4 times a day. The core files are generated in hobbit/server/tmp and the process is restarted. An alert is also sent under the test name 'hobbitd_client'. Here is the stack trace from the latest core file. Please note that the server name has been masked after the fact. An interesting side note is that it always seems to dump on the same client server. Note that the client is running the new Hobbit software.
Also, I recently made a change to increase the max message size to 800,000 bytes.
[root at sac-pmon-01 tmp]# gdb ../bin/hobbitd_client core.19313 GNU gdb Red Hat Linux (6.1post-1.20040607.41rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".
Core was generated by `hobbitd_client'. Program terminated with signal 6, Aborted. Reading symbols from /usr/local/lib/libpcre.so.0...done. Loaded symbols for /usr/local/lib/libpcre.so.0 Reading symbols from /lib64/tls/libc.so.6...done. Loaded symbols for /lib64/tls/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 #0 0x0000003b1a82e4dd in raise () from /lib64/tls/libc.so.6 (gdb) bt #0 0x0000003b1a82e4dd in raise () from /lib64/tls/libc.so.6 #1 0x0000003b1a82fc8e in abort () from /lib64/tls/libc.so.6 #2 0x000000000040c9a3 in sigsegv_handler (signum=19313) at sig.c:57 #3 <signal handler called> #4 0x0000003b1a86eab0 in strchr () from /lib64/tls/libc.so.6 #5 0x00000000004045bb in handle_solaris_client ( hostname=0x513a8c "wal-ddbs-01.x.x.x.com <http://wal-ddbs-01.x.x.x.com>", hinfo=0x6e5370, sender=0x3d <Address 0x3d out of bounds>, timestamp=4252624, clientdata=0x0) at solaris.c:62 #6 0x0000000000405079 in main (argc=5323443, argv=0x7fffffffd348) at hobbitd_client.c:807 (gdb)
[root at sac-pmon-01 tmp]# gdb ../bin/hobbitd_client core.11307 GNU gdb Red Hat Linux (6.1post-1.20040607.41rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".
Core was generated by `hobbitd_client'. Program terminated with signal 6, Aborted. Reading symbols from /usr/local/lib/libpcre.so.0...done. Loaded symbols for /usr/local/lib/libpcre.so.0 Reading symbols from /lib64/tls/libc.so.6...done. Loaded symbols for /lib64/tls/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 #0 0x0000003b1a82e4dd in raise () from /lib64/tls/libc.so.6 (gdb) bt #0 0x0000003b1a82e4dd in raise () from /lib64/tls/libc.so.6 #1 0x0000003b1a82fc8e in abort () from /lib64/tls/libc.so.6 #2 0x000000000040c9a3 in sigsegv_handler (signum=11307) at sig.c:57 #3 <signal handler called> #4 0x0000003b1a86eab0 in strchr () from /lib64/tls/libc.so.6 #5 0x00000000004045bb in handle_solaris_client ( hostname=0x513a8c "wal-ddbs-01.x.x.x.com <http://wal-ddbs-01.x.x.x.com>", hinfo=0x6d8f70, sender=0x3d <Address 0x3d out of bounds>, timestamp=0, clientdata=0x0) at solaris.c:62 #6 0x0000000000405079 in main (argc=5323443, argv=0x7fffffffd348) at hobbitd_client.c:807 (gdb)
[root at sac-pmon-01 tmp]# gdb ../bin/hobbitd_client core.10241 GNU gdb Red Hat Linux (6.1post-1.20040607.41rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".
Core was generated by `hobbitd_client'. Program terminated with signal 6, Aborted. Reading symbols from /usr/local/lib/libpcre.so.0...done. Loaded symbols for /usr/local/lib/libpcre.so.0 Reading symbols from /lib64/tls/libc.so.6...done. Loaded symbols for /lib64/tls/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 #0 0x0000003b1a82e4dd in raise () from /lib64/tls/libc.so.6 (gdb) bt #0 0x0000003b1a82e4dd in raise () from /lib64/tls/libc.so.6 #1 0x0000003b1a82fc8e in abort () from /lib64/tls/libc.so.6 #2 0x000000000040c9a3 in sigsegv_handler (signum=10241) at sig.c:57 #3 <signal handler called> #4 0x0000003b1a86eab0 in strchr () from /lib64/tls/libc.so.6 #5 0x00000000004045bb in handle_solaris_client ( hostname=0x513a8c "wal-ddbs-01.x.x.x.com <http://wal-ddbs-01.x.x.x.com>", hinfo=0x6e1b90, sender=0x3d <Address 0x3d out of bounds>, timestamp=-64, clientdata=0x0) at solaris.c:62 #6 0x0000000000405079 in main (argc=5323443, argv=0x7fffffffd348) at hobbitd_client.c:807 (gdb
I am having a similar problem. I am currently running the latest snapshot. I cannot remember how far back the problem goes. I was going to grab a hobbitd_client core trace, but hobbitd is coring too, overwriting the hobbitd_client core.
~David
On Thu, Aug 11, 2005 at 12:04:21PM -0700, Brian Lynch wrote:
Since my upgrade to 4.1.1, I've had a problem with the hobbitd_client crashing at least 3-4 times a day. The core files are generated in hobbit/server/tmp and the process is restarted. An alert is also sent under the test name 'hobbitd_client'. Here is the stack trace from the latest core file. Please note that the server name has been masked after the fact. An interesting side note is that it always seems to dump on the same client server. Note that the client is running the new Hobbit software.
Ouch, you've uncovered an embarassing bit of sloppy programming. Forgetting to verify your pointers before using them is bad for stability. A new snapshot will be done in an hours time, with a set of fixes for this.
Also, I recently made a change to increase the max message size to 800,000 bytes.
No problem. The current snapshots has bumped the max. size for a client message to 1 MB, which I think should be adequate for most systems.
Henrik
Thanks, Henrik!
- Brian
On 8/11/05, Henrik Stoerner <henrik at hswn.dk> wrote:
On Thu, Aug 11, 2005 at 12:04:21PM -0700, Brian Lynch wrote:
Since my upgrade to 4.1.1, I've had a problem with the hobbitd_client crashing at least 3-4 times a day. The core files are generated in hobbit/server/tmp and the process is restarted. An alert is also sent under the test name 'hobbitd_client'. Here is the stack trace from the latest core file. Please note that the server name has been masked after the fact. An interesting side note is that it always seems to dump on the same client server. Note that the client is running the new Hobbit software.
Ouch, you've uncovered an embarassing bit of sloppy programming. Forgetting to verify your pointers before using them is bad for stability. A new snapshot will be done in an hours time, with a set of fixes for this.
Also, I recently made a change to increase the max message size to 800,000 bytes.
No problem. The current snapshots has bumped the max. size for a client message to 1 MB, which I think should be adequate for most systems.
Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Henrik, Just launched the snapshot from 8/12/05 00:00 and the client max size is ringing up at 256KB. It looks like the SHAREDBUFSZ_STD is being used for the standard BB 'status' messages. Is that correct behavior?
- Brian
On 8/11/05, Henrik Stoerner <henrik at hswn.dk> wrote:
On Thu, Aug 11, 2005 at 12:04:21PM -0700, Brian Lynch wrote:
Since my upgrade to 4.1.1, I've had a problem with the hobbitd_client crashing at least 3-4 times a day. The core files are generated in hobbit/server/tmp and the process is restarted. An alert is also sent under the test name 'hobbitd_client'. Here is the stack trace from the latest core file. Please note that the server name has been masked after the fact. An interesting side note is that it always seems to dump on the same client server. Note that the client is running the new Hobbit software.
Ouch, you've uncovered an embarassing bit of sloppy programming. Forgetting to verify your pointers before using them is bad for stability. A new snapshot will be done in an hours time, with a set of fixes for this.
Also, I recently made a change to increase the max message size to 800,000 bytes.
No problem. The current snapshots has bumped the max. size for a client message to 1 MB, which I think should be adequate for most systems.
Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Thu, Aug 11, 2005 at 04:18:48PM -0700, Brian Lynch wrote:
Henrik, Just launched the snapshot from 8/12/05 00:00 and the client max size is ringing up at 256KB. It looks like the SHAREDBUFSZ_STD is being used for the standard BB 'status' messages. Is that correct behavior?
Yes, the "status" channel has a 256 KB buffer. But the "client" channel which is the one used for data fed us by the Hobbit client is 1 MB.
The "client" channel is bigger, because it needs to handle large ps-listings combined with all of the other client output (top, df etc.)
Do you really have a cpu-, disk-, or procs-column (or any other individual status) that needs more than 256 KB data in it ?
Regards, Henrik
Henrik, We are running the Solaris sar script from deadcat that dumps anywhere from 200K to 900K data via the 'status' channel. And thanks for the fix yesterday, the hobbitd_client channel has not core dumped since I put the latest snapshot in place.
Best, Brian
On 8/11/05, Henrik Stoerner <henrik at hswn.dk> wrote:
On Thu, Aug 11, 2005 at 04:18:48PM -0700, Brian Lynch wrote:
Henrik, Just launched the snapshot from 8/12/05 00:00 and the client max size is ringing up at 256KB. It looks like the SHAREDBUFSZ_STD is being used for the standard BB 'status' messages. Is that correct behavior?
Yes, the "status" channel has a 256 KB buffer. But the "client" channel which is the one used for data fed us by the Hobbit client is 1 MB.
The "client" channel is bigger, because it needs to handle large ps-listings combined with all of the other client output (top, df etc.)
Do you really have a cpu-, disk-, or procs-column (or any other individual status) that needs more than 256 KB data in it ?
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Fri, Aug 12, 2005 at 09:28:30AM -0700, Brian Lynch wrote:
We are running the Solaris sar script from deadcat that dumps anywhere from 200K to 900K data via the 'status' channel.
Yikes! that's a lot more than I thought would go in a status message.
I've now made these settings configurable in hobbitserver.cfg, instead of having to change the source and re-compile. The default for the status channel is still 256 kB, but you just add MAXMSG_STATUS="1024" to your hobbitserver.cfg, and it will use that instead.
the hobbitd_client channel has not core dumped since I put the latest snapshot in place.
Good to know.
Thanks, Henrik
Henrik Stoerner a écrit :
I've now made these settings configurable in hobbitserver.cfg, instead of having to change the source and re-compile. The default for the status channel is still 256 kB, but you just add MAXMSG_STATUS="1024" to your hobbitserver.cfg, and it will use that instead.
Hi Henrik
thanks for making all theses sizes configurable. Howerver, the MAXLINE variable still appears in the default hobbitserver.cfg (with a value of 32768). Is it still used ?
--
Frédéric Mangeant
Steria EDC Sophia-Antipolis
Henrik, I spoke too soon on the core dump. I just had another one go this morning. Here is the stack trace (with the domain commented out).
Cheers, Brian
[root at sac-pmon-02 tmp]# gdb ../bin/hobbitd_client core.29614 GNU gdb Red Hat Linux (6.1post-1.20040607.41rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".
Core was generated by `hobbitd_client'. Program terminated with signal 6, Aborted. Reading symbols from /usr/local/lib/libpcre.so.0...done. Loaded symbols for /usr/local/lib/libpcre.so.0 Reading symbols from /lib64/tls/libc.so.6...done. Loaded symbols for /lib64/tls/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 #0 0x000000340232e4dd in raise () from /lib64/tls/libc.so.6 (gdb) bt #0 0x000000340232e4dd in raise () from /lib64/tls/libc.so.6 #1 0x000000340232fc8e in abort () from /lib64/tls/libc.so.6 #2 0x000000000040d723 in sigsegv_handler (signum=29614) at sig.c:57 #3 <signal handler called> #4 0x0000000000402c86 in unix_disk_report ( hostname=0x514aca "wal-dapp-02.x.x.x.com <http://wal-dapp-02.x.x.x.com>", hinfo=0x72ab40, fromline=0x7fffffffca70 "\nStatus message received from 10.3.3.146\n", timestr=0x514b26 "Tue Aug 16 15:38:43 GMT 2005", capahdr=0x40f036 "Capacity", mnthdr=0x40f02e "Mounted", dfstr=0x514cc6 "Filesystem", ' ' <repeats 11 times>, "1024-blocks Used Available Capacity Mounted on\n/dev/md/dsk/d0", ' ' <repeats 11 times>, "3010671 1515862 1434596 52% /\n/dev/md/dsk/d6", ' ' <repeats 11 times>, "1988887 1219506 709"...) at hobbitd_client.c:299 #5 0x000000000040478b in handle_solaris_client ( hostname=0x514aca "wal-dapp-02.x.x.x.com <http://wal-dapp-02.x.x.x.com>", hinfo=0x72ab40, sender=0x2aaaaabca268 "\2009@\0024", timestamp=0, clientdata=0x0) at solaris.c:52 #6 0x0000000000405e2a in main (argc=5327601, argv=0x7fffffffd258) at hobbitd_client.c:827 (gdb)
On 8/13/05, Henrik Stoerner <henrik at hswn.dk> wrote:
On Fri, Aug 12, 2005 at 09:28:30AM -0700, Brian Lynch wrote:
We are running the Solaris sar script from deadcat that dumps anywhere from 200K to 900K data via the 'status' channel.
Yikes! that's a lot more than I thought would go in a status message.
I've now made these settings configurable in hobbitserver.cfg, instead of having to change the source and re-compile. The default for the status channel is still 256 kB, but you just add MAXMSG_STATUS="1024" to your hobbitserver.cfg, and it will use that instead.
the hobbitd_client channel has not core dumped since I put the latest snapshot in place.
Good to know.
Thanks, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
participants (4)
-
brianlynch@gmail.com
-
David.Gore@mci.com
-
frederic.mangeant@steria.com
-
henrik@hswn.dk