Running the latest snapshot from today.
Found that in the bbstatus if I specify the --only option it complains that no DIR was set and exits error code 256. Adding a --DIR=~hobbithome/data/logs corrects the error.
Should the standard DIR not be set using the env file ?
BR Thomas
Greetings,
For the past couple of nights i have been getting the following errors in some of my hobbit logs:
Clientdata.log: 2005-09-28 03:09:36 Worker process died with exit code 134, terminating 2005-09-28 03:09:45 Worker process died with exit code 134, terminating 2005-09-28 04:20:03 Worker process died with exit code 134, terminating 2005-09-28 04:20:09 Worker process died with exit code 134, terminating 2005-09-28 04:50:13 Worker process died with exit code 134, terminating 2005-09-28 04:50:19 Worker process died with exit code 134, terminating
Hobbitlaunch.log
2005-09-29 02:04:20 Task clientdata terminated, status 1 2005-09-29 02:09:23 Task clientdata terminated, status 1 2005-09-29 02:14:28 Task clientdata terminated, status 1 2005-09-29 02:19:31 Task clientdata terminated, status 1 2005-09-29 02:24:35 Task clientdata terminated, status 1 2005-09-29 02:29:39 Task clientdata terminated, status 1 2005-09-29 02:34:37 Task clientdata terminated, status 1 2005-09-29 02:39:41 Task clientdata terminated, status 1 2005-09-29 02:44:46 Task clientdata terminated, status 1 2005-09-29 02:49:46 Task clientdata terminated, status 1 2005-09-29 05:41:36 Task clientdata terminated, status 1 2005-09-29 05:41:37 Task clientdata terminated, status 1
and then all graphing and updates stop...
I am using the latest snapshot, where should i go from here? (as you can see it only seems to happen early in the morning)
thanks, Adam
In <433D4F74.3000908 at marquette.edu> Adam Scheblein <adam.scheblein at marquette.edu> writes:
For the past couple of nights i have been getting the following errors in some of my hobbit logs:
Clientdata.log: 2005-09-28 03:09:36 Worker process died with exit code 134, terminating
"code 134" usually means it it crashed with a segfault (signal 6 = 134-128).
Check for core files in the ~hobbit/server/tmp/ directory - I'm sure you'll find some - and run them through gdb as described in http://www.hswn.dk/hobbit/help/known-issues.html#bugreport
Hobbitlaunch.log
2005-09-29 02:04:20 Task clientdata terminated, status 1 2005-09-29 05:41:37 Task clientdata terminated, status 1
and then all graphing and updates stop...
hobbitlaunch normally restarts a task, but if it continues to crash it will stop doing that (to avoid a run-away program taking up all system ressources by starting all the time). So when we figure out why hobbitd_client is crashing, this should go away also.
Henrik
Here is the gdb output:
gdb bin/hobbitd_client tmp/core GNU gdb 6.3 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".
Core was generated by `hobbitd_client'. Program terminated with signal 6, Aborted.
warning: current_sos: Can't read pathname for load map: Input/output error
Reading symbols from /usr/lib/libpcre.so.0...done. Loaded symbols for /usr/lib/libpcre.so.0 Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 #0 0xb7e75771 in kill () from /lib/libc.so.6 (gdb) bt #0 0xb7e75771 in kill () from /lib/libc.so.6 #1 0xb7e754e5 in raise () from /lib/libc.so.6 #2 0xb7e769d0 in abort () from /lib/libc.so.6 #3 0x080568e1 in sigsegv_handler (signum=0) at sig.c:57 #4 <signal handler called> #5 0xb7eb50f3 in strlen () from /lib/libc.so.6 #6 0x08052d74 in namematch (needle=0x0, haystack=0x1e <Address 0x1e out of bounds>, pcrecode=0x8060838) at matching.c:51 #7 0x080500cf in add_count (pname=0x0, head=0x8060838) at client_config.c:571 #8 0x0804a42e in unix_disk_report (hostname=0xb7dcb035 "optim.csd.mu.edu", hinfo=0x80615a0, fromline=0x0, timestr=0x0, capahdr=0x8058289 "Capacity", mnthdr=0x8058281 "Mounted", dfstr=0xb7dcb14e "Filesystem 1024-blocks Used Available Capacity Mounted on\n/dev/vg02/lvol1_snap") at hobbitd_client.c:308 #9 0x0804c8a2 in handle_hpux_client (hostname=0xb7dcb035 "optim.csd.mu.edu", hinfo=0x80615a0, sender=0x0, timestamp=1128066354, clientdata=0x0) at hpux.c:52 #10 0x0804e0e8 in main (argc=1, argv=0xbfbb3f64) at hobbitd_client.c:859
Henrik Storner wrote:
In <433D4F74.3000908 at marquette.edu> Adam Scheblein <adam.scheblein at marquette.edu> writes:
For the past couple of nights i have been getting the following errors in some of my hobbit logs:
Clientdata.log: 2005-09-28 03:09:36 Worker process died with exit code 134, terminating
"code 134" usually means it it crashed with a segfault (signal 6 = 134-128).
Check for core files in the ~hobbit/server/tmp/ directory - I'm sure you'll find some - and run them through gdb as described in http://www.hswn.dk/hobbit/help/known-issues.html#bugreport
Hobbitlaunch.log
2005-09-29 02:04:20 Task clientdata terminated, status 1 2005-09-29 05:41:37 Task clientdata terminated, status 1
and then all graphing and updates stop...
hobbitlaunch normally restarts a task, but if it continues to crash it will stop doing that (to avoid a run-away program taking up all system ressources by starting all the time). So when we figure out why hobbitd_client is crashing, this should go away also.
Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Fri, Sep 30, 2005 at 10:27:09AM -0500, Adam Scheblein wrote:
Here is the gdb output: #7 0x080500cf in add_count (pname=0x0, head=0x8060838) at client_config.c:571 #8 0x0804a42e in unix_disk_report (hostname=0xb7dcb035 "optim.csd.mu.edu", hinfo=0x80615a0, fromline=0x0, timestr=0x0, capahdr=0x8058289 "Capacity", mnthdr=0x8058281 "Mounted", dfstr=0xb7dcb14e "Filesystem 1024-blocks Used Available Capacity Mounted on\n/dev/vg02/lvol1_snap") at hobbitd_client.c:308
OK, I think I've got it from this trace. Could you apply the attached patch on top of the snapshot you have, and see if that solves it ?
Regards, Henrik
participants (3)
-
adam.scheblein@marquette.edu
-
henrik@hswn.dk
-
tlp-hobbit@holme-pedersen.dk