Hi Brothers,
Just read an article in Linux Journal about using DRDB as a "cluster" filesystem for a redundant installation of sendmail and mysql. This has caused me to think that this could be a way forward as a cluster mecanism for hobbit. Currently I run a rdist job for this and then have a manual intervention for starting up hobbit on the secondary node.
I know from when I last looked at DRDB that the IO performance as a problem with BB but with Hobbit I am prety sure it should work.
I am wondering if any of you have experienced with the DRDB/heartbeat/hobbit and what your impressions are ?
Regards, Thomas
-- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.0.405 / Virus Database: 268.11.1/421 - Release Date: 16-08-2006
Hi Brothers,
Just read an article in Linux Journal about using DRDB as a "cluster" filesystem for a redundant installation of sendmail and mysql. This has caused me to think that this could be a way forward as a cluster mecanism for hobbit.
Yea, I read the article as well. I was attempting implement BB +DRBD on RH Linux but the project stopped due to other priority.
During test phase I was able to simluate the outage of a webser server on A to cause slave server (B) to shoot down A and take over its ip address via private subnet in between etc ...
Currently I run a rdist job for this and then have a manual intervention for starting up hobbit on the secondary node.
I know from when I last looked at DRDB that the IO performance as a problem with BB but with Hobbit I am prety sure it should work.
I do remember the sync between two nodes was great but I didn't really deploy bb on bother server so I don't now how it really run.
I am wondering if any of you have experienced with the DRDB/heartbeat/hobbit and what your impressions are ?
I am less interested about BB+DRBD+RHLinux combination for implementing system monitoring clustering. Instead, I am more interested about Hobbit+Solaris 10+Sun Cluster software. Sorry to hijack your subject into mine.
Sun Cluster software is now free for use( see R1)
R1: http://www.sun.com/software/cluster/
tj
Regards, Thomas
-- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.0.405 / Virus Database: 268.11.1/421 - Release Date: 16-08-2006
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
It is good to see bbgen can count the total hosts being monitored like following http://www.hswn.dk/hobbit-cgi/bb-hostsvc.sh?HOST=voodoo.hswn.dk&SERVICE=bbge...
Is it possible also list out what kind of OS and versions to make up the total hosts ?
Regards
tj
On Thu, Aug 17, 2006 at 08:37:34PM -0500, T.J. Yang wrote:
It is good to see bbgen can count the total hosts being monitored like following http://www.hswn.dk/hobbit-cgi/bb-hostsvc.sh?HOST=voodoo.hswn.dk&SERVICE=bbge...
Is it possible also list out what kind of OS and versions to make up the total hosts ?
A very quick summary can be made with
bb 127.0.0.1 "hobbitdboard test=cpu fields=BBH_OS"|sort |uniq -c
That gives you a summary of what OS your hobbit client enabled systems are running. For a more detailed breakdown of the versions you'd have to request each of the client logs with
bb 127.0.0.1 "clientlog client1.hswn.dk section=osversion"
and then do some summation over that data.
Regards, Henrik
On Thu, Aug 17, 2006 at 08:37:34PM -0500, T.J. Yang wrote:
It is good to see bbgen can count the total hosts being monitored like following http://www.hswn.dk/hobbit-cgi/bb-hostsvc.sh?HOST=voodoo.hswn.dk&SERVICE=bbge...
Is it possible also list out what kind of OS and versions to make up the total hosts ?
A very quick summary can be made with
bb 127.0.0.1 "hobbitdboard test=cpu fields=BBH_OS"|sort |uniq -c
That gives you a summary of what OS your hobbit client enabled systems are running. For a more detailed breakdown of the versions you'd have to request each of the client logs with
bb 127.0.0.1 "clientlog client1.hswn.dk section=osversion"
and then do some summation over that data.
Hi, Henrik Thanks for the quick reply.
In futher version of hobbit, Can you automate the summation of OS/Verson in bbgen ? This will be another good selling feature to IT centers. it will save lots of time for getting OS/version/total nodes summary by manual process.
Regards
tj
When doing Availability Report, is it possible to deduct the downtime caused by system maintenance ?
Regards
tj
By displaying raw notification.log text file, I can quickly confirm if bb server really send email alert to the resolver group's pager.
I found this is quite useful in for system monitor administrator to make sure bb server's bbwarnrules.cfg is functional.
Can hobbit server add a notification report under "Reports" ?
Regards
tj
On Fri, Aug 18, 2006 at 05:13:11AM -0500, T.J. Yang wrote:
By displaying raw notification.log text file, I can quickly confirm if bb server really send email alert to the resolver group's pager.
I found this is quite useful in for system monitor administrator to make sure bb server's bbwarnrules.cfg is functional.
You mean you don't trust the alert summary on the "info" column page?
Can hobbit server add a notification report under "Reports" ?
I've had a task for sometime now to generate an "Incident" reporting feature, where you get a chronological summary of what happened during an incident: When it started, when it was acknowledged, what alerts were sent out, when it recovered. This basically means picking out data from several logs, including the notifications logfile.
Would that cover your needs also?
Regards, Henrik
From: henrik at hswn.dk (Henrik Stoerner) Reply-To: hobbit at hswn.dk To: hobbit at hswn.dk Subject: Re: [hobbit] Feature Request: Notification Log Report Date: Fri, 18 Aug 2006 12:34:54 +0200
On Fri, Aug 18, 2006 at 05:13:11AM -0500, T.J. Yang wrote:
By displaying raw notification.log text file, I can quickly confirm if bb server really send email alert to the
resolver
group's pager.
I found this is quite useful in for system monitor administrator to make sure bb server's bbwarnrules.cfg is functional.
You mean you don't trust the alert summary on the "info" column page?
http://tyge.sslug.dk/hobbit-cgi/bb-hostsvc.sh?HOST=www.linuxforum.dk&SERVICE...
I like this page. Do you have a info page that show differen procs/disk has different recipients ? This will be a good demo of my previous question reqarding to different process alert different recipient.
Can hobbit server add a notification report under "Reports" ?
I've had a task for sometime now to generate an "Incident" reporting feature, where you get a chronological summary of what happened during an incident: When it started, when it was acknowledged, what alerts were sent out, when it recovered. This basically means picking out data from several logs, including the notifications logfile.
Would that cover your needs also?
This exceed my request, even better. I need a page to see the log of all alerts and their paging/email got sent at what time. It is good for incident reivew.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Can hobbit work with malfunction of switch/router ?
ie, if a switich is dead ant it has 200 nodes behind it. there is no need to issue conn outage alerts for those 200 nodes.
T.J. Yang
Le 18/08/2006 15:19, T.J. Yang a écrit :
Can hobbit work with malfunction of switch/router ?
ie, if a switich is dead ant it has 200 nodes behind it. there is no need to issue conn outage alerts for those 200 nodes.
Hi
see the "router" keyword in the bb-hosts(5) manpage :
route:router1,router2,.... This tag is taken from the "fping.sh" script, and is used by bbtest-net when run with the "--ping" option to enable ping testing.
The router1,router2,... is a comma-separated list of hosts elsewhere
in the bb-hosts file. You cannot have any spaces in the list - separate hosts with commas.
This tag changes the color reported for a ping check that fails,
when one or more of the hosts in the "route" list is also down. A "red" status becomes "yellow" - other colors are unchanged. The status message will include information about the hosts in the router-list that are down, to aid tracking down which router is the root cause of the problem.
Note: Internally, the ping test will still be handled as "failed",
and therefore any other tests run for this host will report a status of "clear".
--
Frédéric Mangeant
Steria EDC Sophia-Antipolis
Fr�d�ric Mangean Thanks for the pointer, I will read the fine manpage on hobbit's bb-host
tj
From: Fr�d�ric Mangeant <frederic.mangeant at steria.com> Reply-To: hobbit at hswn.dk To: hobbit at hswn.dk Subject: Re: [hobbit] ping storm Date: Fri, 18 Aug 2006 15:24:23 +0200
Le 18/08/2006 15:19, T.J. Yang a �crit :
Can hobbit work with malfunction of switch/router ?
ie, if a switich is dead ant it has 200 nodes behind it. there is no need to issue conn outage alerts for those 200 nodes.
Hi
see the "router" keyword in the bb-hosts(5) manpage :
route:router1,router2,.... This tag is taken from the "fping.sh" script, and is used by bbtest-net when run with the "--ping" option to enable ping testing.
The router1,router2,... is a comma-separated list of hosts elsewhere in the bb-hosts file. You cannot have any spaces in the list - separate hosts with commas.
This tag changes the color reported for a ping check that fails, when one or more of the hosts in the "route" list is also down. A "red" status becomes "yellow" - other colors are unchanged. The status message will include information about the hosts in the router-list that are down, to aid tracking down which router is the root cause of the problem.
Note: Internally, the ping test will still be handled as "failed", and therefore any other tests run for this host will report a status of "clear".
--
Fr�d�ric Mangeant
Steria EDC Sophia-Antipolis
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Fri, Aug 18, 2006 at 04:40:05AM -0500, T.J. Yang wrote:
When doing Availability Report, is it possible to deduct the downtime caused by system maintenance ?
If you do it in advance, yes. Ie. define DOWNTIME settings for the hosts covering your regular outages (e.g. the scheduled reboot every Saturday morning), or disable the host/service when doing maintenance.
Periods with a "blue" (disabled) status are not included in the availability calculations.
Regards, Henrik
Hello,
I'm currently migrating to hobbit 4.2. All is going ok but I notice something strange under solaris (6->10)! Under the cpu column, I didn't get any output, only load graph. I don't understand because it's working fine with previous version of the client (4.1.2p1). I'm looking at the hobbitclient-sunos.sh and find these lines :
$TOP must be set, the install utility should do that for us if it
exists. if test "$TOP" != "" then if test -x "$TOP" then echo "[top]" $TOP -b 20 fi fi
So, I'm opening the etc/hobbitclient.cfg and have this variable :
TOP="/usr/bin/prstat -can 20 1 1"
I'm launching this command from a console and it works perfectly as you can see :
root at psa129:/ # /usr/bin/prstat -can 20 1 1
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
20555 root 4432K 4088K cpu0 39 0 0:00:00 0.4% prstat/1
16570 oraadm 7712K 2192K sleep 59 0 0:00:00 0.1% sshd/1
[snip]
454 root 1808K 528K sleep 59 0 0:00:00 0.0% ttymon/1
NPROC USERNAME SIZE RSS MEMORY TIME CPU
101 root 563M 227M 47% 4:09:40 0.4%
77 oraadm 507M 189M 39% 0:05:18 0.2%
1 smmsp 4352K 840K 0.2% 0:00:31 0.0%
10 hobbit 11M 9336K 1.9% 0:00:23 0.0%
1 daemon 2696K 1592K 0.3% 0:00:23 0.0%
Total: 221 processes, 363 lwps, load averages: 0.08, 0.14, 0.16
Under my cpu test on the hobbit display I just have this line : "System clock is 0 seconds off" Is it the new behaviour for this test or something is wrong on my configuration ? What does "System clock is 0 seconds off" mean ?
Once again, thanks for all your great work !
Best regards,
ThomaS
Ce message (et toutes ses pieces jointes eventuelles) est confidentiel et etabli a l'intention exclusive de ses destinataires. Toute utilisation de ce message non conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite, sauf autorisation expresse. L'internet ne permettant pas d'assurer l'integrite de ce message, CNP Assurances et ses filiales declinent toute responsabilite au titre de ce message, s'il a ete altere, deforme ou falsifie.
This message and any attachments (the "message") are confidential and intended solely for the addressees. Any unauthorised use or dissemination is prohibited. E-mails are susceptible to alteration. Neither CNP Assurances nor any of its subsidiaries or affiliates shall be liable for the message if altered, changed or falsified.
I noticed this problem too. I changed the entry for TOP in hobbitclient-sunos.sh to:
$TOP must be set, the install utility should do that for us if it
exists.
if test "$TOP" != ""
then
if [ expr "$TOP" : "/usr/bin/prstat" -ne 0 ]
then
echo "[top]"
$TOP
elif test -x "$TOP"
then
echo "[top]"
$TOP -b 20
fi
fi
thomas.seglard.enata at cnp.fr 18/08/2006 13:38 Please respond to hobbit at hswn.dk
To hobbit at hswn.dk cc
Subject [hobbit] Question on cpu test under solaris 6->10
Hello,
I'm currently migrating to hobbit 4.2. All is going ok but I notice something strange under solaris (6->10)! Under the cpu column, I didn't get any output, only load graph. I don't understand because it's working fine with previous version of the client (4.1.2p1). I'm looking at the hobbitclient-sunos.sh and find these lines :
$TOP must be set, the install utility should do that for us if it
exists. if test "$TOP" != "" then if test -x "$TOP" then echo "[top]" $TOP -b 20 fi fi
So, I'm opening the etc/hobbitclient.cfg and have this variable :
TOP="/usr/bin/prstat -can 20 1 1"
I'm launching this command from a console and it works perfectly as you can see :
root at psa129:/ # /usr/bin/prstat -can 20 1 1 PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 20555 root 4432K 4088K cpu0 39 0 0:00:00 0.4% prstat/1 16570 oraadm 7712K 2192K sleep 59 0 0:00:00 0.1% sshd/1 [snip] 454 root 1808K 528K sleep 59 0 0:00:00 0.0% ttymon/1 NPROC USERNAME SIZE RSS MEMORY TIME CPU 101 root 563M 227M 47% 4:09:40 0.4% 77 oraadm 507M 189M 39% 0:05:18 0.2% 1 smmsp 4352K 840K 0.2% 0:00:31 0.0% 10 hobbit 11M 9336K 1.9% 0:00:23 0.0% 1 daemon 2696K 1592K 0.3% 0:00:23 0.0% Total: 221 processes, 363 lwps, load averages: 0.08, 0.14, 0.16
Under my cpu test on the hobbit display I just have this line : "System clock is 0 seconds off" Is it the new behaviour for this test or something is wrong on my configuration ? What does "System clock is 0 seconds off" mean ?
Once again, thanks for all your great work !
Best regards,
ThomaS
Ce message (et toutes ses pieces jointes eventuelles) est confidentiel et etabli a l'intention exclusive de ses destinataires. Toute utilisation de ce message non conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite, sauf autorisation expresse. L'internet ne permettant pas d'assurer l'integrite de ce message, CNP Assurances et ses filiales declinent toute responsabilite au titre de ce message, s'il a ete altere, deforme ou falsifie.
This message and any attachments (the "message") are confidential and intended solely for the addressees. Any unauthorised use or dissemination is prohibited. E-mails are susceptible to alteration. Neither CNP Assurances nor any of its subsidiaries or affiliates shall be liable for the message if altered, changed or falsified.
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
The problem seems to be that $TOP is set to "/usr/bin/prstat -can 20 1 1" And this causes the if statement to fail { if test -x "$TOP"} I made a change to hobbitclient.cfg and hobbitserver.cfg so that: TOP="/usr/bin/prstat" TOPARGS="-can 20 1 1"
And then changed hobbitclient-sunos.sh to: if test "$TOP" != "" then if test -x "$TOP" then echo "[top]"
$TOP -b 40
$TOP $TOPARGS
fi
fi
And this works well except the hobbit server does not get a cpu load graph. It only gets a graph if there is not [top] line in the data set
From: <thomas.seglard.enata at cnp.fr> Reply-To: <hobbit at hswn.dk> Date: Fri, 18 Aug 2006 14:38:53 +0200 To: <hobbit at hswn.dk> Subject: [hobbit] Question on cpu test under solaris 6->10
Hello,
I'm currently migrating to hobbit 4.2. All is going ok but I notice something strange under solaris (6->10)! Under the cpu column, I didn't get any output, only load graph. I don't understand because it's working fine with previous version of the client (4.1.2p1). I'm looking at the hobbitclient-sunos.sh and find these lines :
$TOP must be set, the install utility should do that for us if it
exists. if test "$TOP" != "" then if test -x "$TOP" then echo "[top]" $TOP -b 20 fi fi
So, I'm opening the etc/hobbitclient.cfg and have this variable :
TOP="/usr/bin/prstat -can 20 1 1"
I'm launching this command from a console and it works perfectly as you can see :
root at psa129:/ # /usr/bin/prstat -can 20 1 1 PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 20555 root 4432K 4088K cpu0 39 0 0:00:00 0.4% prstat/1 16570 oraadm 7712K 2192K sleep 59 0 0:00:00 0.1% sshd/1 [snip] 454 root 1808K 528K sleep 59 0 0:00:00 0.0% ttymon/1 NPROC USERNAME SIZE RSS MEMORY TIME CPU 101 root 563M 227M 47% 4:09:40 0.4% 77 oraadm 507M 189M 39% 0:05:18 0.2% 1 smmsp 4352K 840K 0.2% 0:00:31 0.0% 10 hobbit 11M 9336K 1.9% 0:00:23 0.0% 1 daemon 2696K 1592K 0.3% 0:00:23 0.0% Total: 221 processes, 363 lwps, load averages: 0.08, 0.14, 0.16
Under my cpu test on the hobbit display I just have this line : "System clock is 0 seconds off" Is it the new behaviour for this test or something is wrong on my configuration ? What does "System clock is 0 seconds off" mean ?
Once again, thanks for all your great work !
Best regards,
ThomaS
Ce message (et toutes ses pieces jointes eventuelles) est confidentiel et etabli a l'intention exclusive de ses destinataires. Toute utilisation de ce message non conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite, sauf autorisation expresse. L'internet ne permettant pas d'assurer l'integrite de ce message, CNP Assurances et ses filiales declinent toute responsabilite au titre de ce message, s'il a ete altere, deforme ou falsifie.
This message and any attachments (the "message") are confidential and intended solely for the addressees. Any unauthorised use or dissemination is prohibited. E-mails are susceptible to alteration. Neither CNP Assurances nor any of its subsidiaries or affiliates shall be liable for the message if altered, changed or falsified.
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Fri, Aug 18, 2006 at 02:38:53PM +0200, thomas.seglard.enata at cnp.fr wrote:
I'm currently migrating to hobbit 4.2. All is going ok but I notice something strange under solaris (6->10)! Under the cpu column, I didn't get any output, only load graph.
It's a bug in the hobbitclient-sunos script.
There's a patch available on http://www.hswn.dk/hobbitsw/patches/ now.
What does "System clock is 0 seconds off" mean ?
It means the client's system clock is in sync with the clock on the server. It's a rough measure of how well your hosts are synchronized against a common time-source. You can use the CLOCK setting in hobbit-clients.cfg to warn you if a host clock drifts too far from the norm.
Regards, Henrik
Thanks for this patch ! And thanks for your explanation about CLOCK. Which utility are you using to measure the drift ? Best regards,
Thomas
henrik at hswn.dk (Henrik Stoerner) a écrit sur 19/08/2006 11:06:23 :
On Fri, Aug 18, 2006 at 02:38:53PM +0200, thomas.seglard.enata at cnp.fr wrote:
I'm currently migrating to hobbit 4.2. All is going ok but I notice something strange under solaris (6->10)! Under the cpu column, I didn't get any output, only load graph.
It's a bug in the hobbitclient-sunos script.
There's a patch available on http://www.hswn.dk/hobbitsw/patches/ now.
What does "System clock is 0 seconds off" mean ?
It means the client's system clock is in sync with the clock on the server. It's a rough measure of how well your hosts are synchronized against a common time-source. You can use the CLOCK setting in hobbit-clients.cfg to warn you if a host clock drifts too far from the norm.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Ce message (et toutes ses pieces jointes eventuelles) est confidentiel et etabli a l'intention exclusive de ses destinataires. Toute utilisation de ce message non conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite, sauf autorisation expresse. L'internet ne permettant pas d'assurer l'integrite de ce message, CNP Assurances et ses filiales declinent toute responsabilite au titre de ce message, s'il a ete altere, deforme ou falsifie.
This message and any attachments (the "message") are confidential and intended solely for the addressees. Any unauthorised use or dissemination is prohibited. E-mails are susceptible to alteration. Neither CNP Assurances nor any of its subsidiaries or affiliates shall be liable for the message if altered, changed or falsified.
thomas.seglard.enata at cnp.fr wrote :
And thanks for your explanation about CLOCK. Which utility are you using to measure the drift ?
A substract and a comparison. It's public domain :).
It just compares the hobbit server time against the client time, and gives a error if the difference is too big. It lets one drop deadcat's bb-ntp.sh for most hosts.
-- Charles Goyard - cgoyard at cvf.fr - (+33) 1 45 38 01 31 (lunch together today ?)
Hi again,
Thanks for your answers, I will give it a go and post my results.
I am thinking about adding Mon for local hobbit monitoring, but for that to work I would need some "ping - pong" responce from opening a sesion to the deamon. Anybody know what I could send and what I should expect back ?
BR Thomas
Thomas wrote:
Hi Brothers,
Just read an article in Linux Journal about using DRDB as a "cluster" filesystem for a redundant installation of sendmail and mysql. This has caused me to think that this could be a way forward as a cluster mecanism for hobbit. Currently I run a rdist job for this and then have a manual intervention for starting up hobbit on the secondary node.
I know from when I last looked at DRDB that the IO performance as a problem with BB but with Hobbit I am prety sure it should work.
I am wondering if any of you have experienced with the DRDB/heartbeat/hobbit and what your impressions are ?
Regards, Thomas
On Fri, Aug 18, 2006 at 10:59:21AM +0200, Thomas wrote:
I am thinking about adding Mon for local hobbit monitoring, but for that to work I would need some "ping - pong" responce from opening a sesion to the deamon. Anybody know what I could send and what I should expect back ?
bb 127.0.0.1 "ping"
returns the text "hobbitd VERSION"
Henrik
participants (9)
-
cgoyard@cvf.fr
-
cspargo2@csc.com
-
frederic.mangeant@steria.com
-
henrik@hswn.dk
-
rdeal@tigr.ORG
-
thomas.seglard.enata@cnp.fr
-
tj_yang@hotmail.com
-
tlp-bb@holme-pedersen.dk
-
tlp-hobbit@holme-pedersen.dk