On Monday 06 August 2007 21:25:46 Haertig, David F (Dave) wrote:
I try to identify filesystem "space hogs" via custom scripts I wrote a long time ago when using BB. 99% of my custom stuff is done in PERL.
I use 'du -k' to get the size of all directories in the filesystem. I then cut those results down to only the first and second level directories (but you could go as deep as you want). I store the size of each subdirectory in a small "database". I did this ages ago and my code uses PERL's "Storable" module to store the accumulated date into a file (called my "database"). These days I'd just use Hobbit's easily accessed RRD files. I then use PERL's Statistics::Descriptive::least_squares_fit() to calculate the slope and linear correlation coefficient of the "best fit line".
This would be really useful to do on directories monitored with the dir option in client-local.cfg plus DIR option in hobbit-clients, e.g. to be able to specify alerts at specified "time before disk is full".
This allows me to see how fast each subdirectory is growing/shrinking, and how linear that growth/reduction is. I trigger yellow/red conditions based on rate of growth and predicted fill time at current growth rate, in addition to the standard "95% full = red" test.
The above makes it fairly easy to identify which subdirectory is your problem, which is often times good enough to identify the file/process that is killing you. When that's not, I have a seperate test that tries to identify problem files a different way. BB/Hobbit uses 'top' to identify cpu-hogging processes. Many times you see files hogging space are directly tied to processes hogging cpu (runaway process = runaway file in many cases). 'top' identifies the process(es), then "lsof -p <pid>" is used to identify the files that the suspect process has open. Finding a cpu-hogger that has a filespace-hogger open is usually the holy grail you seek.
The "CPU usage by process" graph is the utopian one ...
As a "repair" action for Hobbit, I squirreled away 2Gb of diskspace in 100Mb chunks for critical filesystems. "dd if=/dev/zero of=/filesystem/DiskSpaceReserve/reserve01 bs=1024 count=102400", then "cp reserve01 reserve02", etc. to build up the reserve.
lvextend may be another useful command here ...
Regards, Buchan