On Thu, Feb 12, 2009 at 06:06:48PM +0000, Flyzone Micky wrote:
"really low" as in ... how much ?
Output of iostat command: avg-cpu: %user %nice %system %iowait %steal %idle 2.22 0.00 0.91 3.62 0.00 93.26
This is the output of iostat about nfs:
Device: rBlk_nor/s wBlk_nor/s rBlk_dir/s
vnetapp:/vol/hobbit 1631.11 373.97 0.00
wBlk_dir/s rBlk_svr/s wBlk_svr/s rops/s wops/s 0.00 1170.83 825.22 840.76 840.76
In this last iostat have also a rsync statistic in it cause I was mantening a rsync on local disk of hobbit.
Unlucky nfsstat doesn't sho
of all the RRD files - takes about 8 minutes. No chance at all then of keeping up with 5-minute update cycles.
But in this case will not appear a warning like this (that I don't have)? WARNING: Runtime 110 longer than BBSLEEP
I really think you should try shutting off the hobbitd_rrd tasks, just to see what happens.
Maybe I missed in the last post, but I have already done, and didn't solve the problem.
For hosts to go purple they have to go more than 30 minutes without an update - they don't go purple just because they miss a single update.
Right...but doesn't appear always, I remember also an old patch that was in all-in-one about dirty-datas, but was already applied.
I suppose you have check the kernel logs ('dmesg' output) for anything odd ?
Done, like all the logs in the system and hobbit. Nothing more message that could help.
I'm wondering if maybe you're running out of ports (there's only 64K of them, only about half can be used by normal apps). How many ports do you have in TIME_WAIT state ?
Excluded, the port is 235-300 at maximun, and in the kernel parameter I also tried to use (like in Oracle): net.ipv4.ip_local_port_range = 1024 65000 but with or without nothing change.
Another thing is the size of the ARP cache, if your hosts are all on the same IP network or your router/firewall is doing proxy-arp.
The networks are about 4 differents. And however, remember about my test on a just 20 clients.
Is this server also running the network tests ? ... sysctl net.ipv4.tcp_tw_reuse=1 which enables the kernel to re-use ports that are in a TIME_WAIT
Yes, but like before...appear also with just a 20 clients, so I would exclude a problem related at the numbers of clients. However I tried also with: net.ipv4.tcp_fin_timeout = 30 instead of the default 120 seconds in RHEL5 to leave a port in TIME_WAIT state.
One (I) would expect the 64-bit systems to have a bit more "oomph" so they should be the ones that worked best.
Ahm...what is a oomph? :-S
A datapoint here. I'm also running Hobbit on a 64-bit Linux platform, but it is using SPARC (Sun) hardware.
we are trying to shutdown all our sparc and pass to linux.. :)
So you're saying that on a RHEL 5.3 64-bit Intel server, setting up Hobbit and feeding it with data from ~20 clients will make the system break?
Yes, this is the point RHEL > 5.0 and 64bit (AMD)... I need yet to try on Fedora 10 64bit
I think I would have heard about it before if this was a general problem.
Eh...I would like also to have heard it before :)))
However, shutting down hobbit, in the ipcs command yet show the shared memory segment used with no process hobbit active, maybe something that hangs in hobbit?
Have a nice day
P.S: how could I reply using normal email client without create a new thread to the ML?
-- Be Yourself @ mail.com! Choose From 200+ Email Addresses Get a Free Account at www.mail.com