I replaced the binaries
It ran for ~ 3 hours
I just get:
2007-08-28 12:17:12 Setup complete 2007-08-28 12:44:53 BOARDBUSY locked at 2, GETNCNT is 0, GETPID is 9917, 2 clients 2007-08-28 15:08:58 BOARDBUSY locked at 2, GETNCNT is 0, GETPID is 9917, 2 clients 2007-08-28 15:10:20 BOARDBUSY locked at 1, GETNCNT is 0, GETPID is 10501, 2 clients 2007-08-28 15:11:23 BOARDBUSY locked at 1, GETNCNT is 0, GETPID is 9917, 1 clients 2007-08-28 15:14:53 BOARDBUSY locked at 2, GETNCNT is 0, GETPID is 9917, 2 clients 2007-08-28 15:16:53 BOARDBUSY locked at 1, GETNCNT is 0, GETPID is 9917, 1 clients
And a red hobbitd_channel sent to the daemon
Core file says:
Reading hobbitd_channel core file header read successfully Reading ld.so.1 Reading libresolv.so.2 Reading libsocket.so.1 Reading libnsl.so.1 Reading libc.so.1 program terminated by signal ABRT (Abort) 0xfee60717: __lwp_kill+0x0007: jae __lwp_kill+0x15 [ 0xfee60725, .+0xe ] Current function is sigsegv_handler 57 abort(); (dbx) where
[1] __lwp_kill(0x1, 0x6), at 0xfee60717 [2] _thr_kill(0x1, 0x6), at 0xfee5ded4 [3] raise(0x6), at 0xfee0ced3 [4] abort(0x80599c0, 0x0, 0x8046758, 0xfee4dd4f, 0x8046758, 0xfee4dd4f), at 0xfedf0969 =>[5] sigsegv_handler(signum = 11), line 57 in "sig.c" [6] __sighndlr(0xb, 0x0, 0x80467f0, 0x804ebe8), at 0xfee5fadf [7] call_user_handler(0xb, 0x0, 0x80467f0), at 0xfee560d3 [8] sigacthandler(0xb, 0x0, 0x80467f0, 0xf, 0x0, 0x0), at 0xfee56253 ---- called from signal handler with signal 11 (SIGSEGV) ------ [9] main(argc = 4, argv = 0x8046b28), line 676 in "hobbitd_channel.c"
Meaning it tried to spawn a thread and dumped core
Is this a "nicer" crash ? Meaning that it will keep runing since it just core dumped on a fork and not the whole channel? Or is this something more?
-Sean
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: Tuesday, August 28, 2007 10:57 AM To: hobbit at hswn.dk Subject: Re: [hobbit] Hobbid_channel crashing on me
On Tue, Aug 28, 2007 at 09:26:51AM -0400, Sean R. Clark wrote:
I have 18,102 RRD's, 17,671 of which are controlled by the hobbitd_channel (the others are written/populated from other sources)
The slice I have the data on has a busy% between 16-88% depending on what's going on (so yes, high I/O as well)
OK, then I'd suggest that you pick up the current snapshot of Hobbit from http://www.hswn.dk/beta/ and build that. The only parts you need to replace in your current setup are these binaries:
- hobbitd/hobbitd_channel
- hobbitd/hobbitd_rrd
- web/hobbitgraph.cgi
After running "make", shutdown Hobbit and copy these files to your ~hobbit/server/bin/ directory (it's probably wise to save the original ones first). Then start Hobbit again, and everything should be working fine - with a lot less I/O load, and no memory leak in hobbitd_channel.
What's changed internally is that updates of the RRD files are now cached for up to 30 minutes before being written to disk; the RRDtool library can handle "batch" updates of the data, so instead of updating the RRD file with 1 dataset every 5 minutes, it now gets 6 datasets in one operation every 30 minutes.
This also means that when you shutdown Hobbit, you'll see that the hobbitd_rrd process takes quite a long time to finish - it is busy writing all of the cached updates to disk. On my work server, this takes about 5 minutes.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk