Howto: Highly available Zimbra cluster using Heartbeat and DRBD
PLEASE NOTE: This article is 4 years old. I no longer use Zimbra. Please don't ask me to update or address any issues that the guide has, due to newer versions of Zimbra that have been released since. I don't know what works/doesn't work, because I don't use it. I accept no responsibility for anything that results by running any of the commands in this guide.
This morning I successfully set up a clustered, high availability pair of Zimbra (VMware virtual) servers, synced with DRBD and using Heartbeat to failover to the secondary standby server.
This is a howto that tries to cover *all* the steps, as there seems to be a great series of Howto's on the subject that in one way or another, leave something out with 'I am assuming you already (insert service here) working and will not cover this' clauses. In particular I ran into a few small hurdles with DRBD and hostnames and whatnot, so tried to document what I needed to do to make it work.
In this set up, I install Debian Etch 4.0 on a VMware VM using the netinstall iso image, and along the line (after installing Zimbra itself) I clone the machine by copying its vmdk disk image to save time and avoid having to duplicate too many steps.
In this howto, the one zimbra 'domain' that both servers believe themselves to be is 'zimbra.yourdomain.com'. You'll notice a bit of hostname fiddling from time to time: this is required to keep Zimbra happy at install time, and also later during DRBD configuration things change again. In the end, the two VMs are 'zimbra-1' and 'zimbra-2' with respective IPs of 192.168.1.11 and 192.168.1.12. The 'virtual' IP of 'zimbra.yourdomain.com' is 192.168.1.10. Heartbeat configures whichever server is to take over the running of Zimbra with this virtual IP as a virtual ethernet interface.
Please replace zimbra.yourdomain.com, zimbra-1, zimbra-2 and the IP addresses to whatever suits your environment.
Howto
1) First steps - DNS
I edited the DNS server authoritative for the domain 'yourdomain.com' (in my case, an internal DNS server on the same LAN) to add these entries:
zimbra IN A 192.168.1.10
zimbra MX 10 zimbra
zimbra-1 IN A 192.168.1.11
zimbra-1 MX 10 zimbra-1
zimbra-2 IN A 192.168.1.12
zimbra-2 MX 10 zimbra-2
as well as the reverse PTR entries.
2) Debian Install - Manual Partitioning
I did a standard netinstall of Debian Etch on the zimbra-1 VM, but manually set up the partitioning as follows. Note the low specs of these machines, it was only a test after all and not a production server :)
/boot /dev/sda1 100MB (primary) (bootable flag on)
/ /dev/sda5 3GB (logical) (ext3)
swap /dev/sda6 512MB (logical)
(unmounted) /dev/sda7 150MB (logical) (ext3) # this'll be the DRBD meta-disk
(unmounted) /dev/sda8 7GB (logical) (ext3) # this'll be the /opt partition used by DRBD
Note that sda7 and sda8 are not mounted. Debian will try to warn you about this, but just ignore the warnings and continue with the installation. We will let Heartbeat mount these devices through /dev/drbd0 when needed.
3) Remove exim4
If you installed Debian with a network mirror and 'Standard System' checked in tasksel, Debian will install exim4 which we don't want since Zimbra will be using its Postfix installation.
apt-get remove --purge exim4 exim4-base exim4-config exim4-daemon-light
4) Install extra packages
These packages are required to install Zimbra. We also throw in DRBD for use later on.
apt-get install ntp ntpdate libc6-i686 sudo libidn11 curl fetchmail libgmp3c2 libexpat1 libgetopt-mixed-perl libxml2 libstdc++6 libpcre3 libltdl3 ssh drbd0.7-module-source drbd0.7-utils linux-headers-`uname -r`
5) Edit (fudge) the hostname to keep Zimbra happy
To install Zimbra successfully, we must trick the server into thinking it is the 'real' domain zimbra.yourdomain.com where in fact it is zimbra-1.
echo zimbra.yourdomain.com > /etc/hostname
6) Reboot the server
reboot
7) Mount /opt
We will now temporarily mount /dev/sda8 as /opt so that we can do a Zimbra installation.
mount -t ext3 /dev/sda8 /opt
8) Download, extract and install Zimbra Collaboration Suite (Open Source edition)
At the time of writing, ZCS was version 5.09 and we are downloading the Open Source Edition Debian pack.
cd /tmp/
wget "<a href="http://h.yimg.com/lo/downloads/5.0.9_GA/zcs-5.0.9_GA_2533.DEBIAN4.0.20080815215219.tgz"
tar">http://h.yimg.com/lo/downloads/5.0.9_GA/zcs-5.0.9_GA_2533.DEBIAN4.0.2008...</a> zxfv zcs-5.0.9_GA_2533.DEBIAN4.0.20080815215219.tgz
cd zcs-5.0.9_GA_2533.DEBIAN4.0.20080815215219
./install.sh -l
This install should go ok if your hostname is set to zimbra.yourdomain.com. Zimbra will alert you to a DNS MX record error, because the MX record for zimbra.yourdomain.com points to the virtual IP (192.168.1.10) and not zimbra-1's IP (192.168.1.11) . That's ok, we want it like that, so ignore the error and say 'No' to 'Change domain' or whatever the question is.
9) Remove Zimbra startup scripts
We want to remove the Zimbra startup scripts because Heartbeat will be handling the starting of Zimbra when it needs to.
This command will probably work:
update-rc.d -f zimbra remove
But I did it the long, and probably non-Debian way, because I was not thinking straight:
rm /etc/rc2.d/S99zimbra
rm /etc/rc3.d/S99zimbra
rm /etc/rc4.d/S99zimbra
rm /etc/rc5.d/S99zimbra
10) Change hostname back for DRBD, modify /etc/hosts
Now that we have Zimbra installed, we need to change the hostname again to make DRBD work. Note that you can't just edit /etc/hosts and fudge the local hostname because DRBD is smarter and will report a mismatch if /etc/hostname and /etc/hosts don't agree.
echo zimbra-1 > /etc/hostname
Nonetheless we will now edit /etc/hosts and tell zimbra-1 that it is also zimbra.yourdomain.com, and also that there is a zimbra-2 at 192.168.1.12 (although there isn't just yet). Your /etc/hosts on zimbra-1 should now look like this:
127.0.0.1 zimbra.yourdomain.com localhost.localdomain localhost
192.168.1.11 zimbra-1 zimbra.yourdomain.com
192.168.1.12 zimbra-2
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
11) Shutdown and clone zimbra-1 to make a zimbra-2
At this point I cloned zimbra-1's vmdk image and created a zimbra-2. Edit per comment: glossed over this, but it seems pretty obvious, being a clone, it will have the same IP etc as zimbra-1. To make zimbra-2 the equivalent of zimbra-1, set its IP to be 192.168.1.12 instead of zimbra-1's 192.168.1.11 (you may have issues bringing up the eth interface entirely until that point, since it was a virtual machine, edit /etc/networking/interfaces from within the VMware Console) and change the hostname:
echo zimbra-2 > /etc/hostname
And edit the hosts file to look like this:
127.0.0.1 zimbra.yourdomain.com localhost.localdomain localhost
192.168.1.12 zimbra-2 zimbra.yourdomain.com
192.168.1.11 zimbra-1
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
12) Reboot both servers, install DRBD and configure
On both zimbra-1 and zimbra-2:
cd /usr/src/
tar xvfz drbd0.7.tar.gz
cd modules/drbd/drbd
make
make install
mv /etc/drbd.conf /etc/drbd.conf.orig
Make a new /etc/drbd.conf that looks like this:
resource r0 {
protocol C;
incon-degr-cmd "halt -f";
startup {
degr-wfc-timeout 120; # 2 minutes
}
disk {
on-io-error detach;
}
net {
}
syncer {
rate 10M;
group 1;
al-extents 257;
}
on zimbra-1 {
device /dev/drbd0;
disk /dev/sda8;
address 192.168.1.11:7788;
meta-disk /dev/sda7[0];
}
on zimbra-2 {
device /dev/drbd0;
disk /dev/sda8;
address 192.168.1.12:7788;
meta-disk /dev/sda7[0];
}
}
13) Get the first DRBD sync going
On zimbra-1 and zimbra-2
modprobe drbd
drbdadm up all
If you get a heap of complaint about mispelt or mismatching hostnames, check that you changed the hostnames on each server to their respective zimbra-1/zimbra-2 hostname as per above, and that you did a reboot of each.
Otherwise with no errors:
On zimbra-1:
drbdadm -- --do-what-I-say primary all
drbdadm -- connect all
When I ran the 'connect all' second command, I got some odd error about a DRBD child process that couldn't terminate. It was odd, because I didn't get that when I set up HA NFS using DRBD and Heartbeat the previous day on other servers! Nonetheless, I ran:
cat /proc/drbd
And I could see that the syncing was taking place between the two servers nonetheless. It looked something like this (stole this output from an NFS howto but it looks like this)
version: 0.7.20 (api:77/proto:74)
SVN Revision: 1743 build by <a href="mailto:phil@mescal">phil@mescal</a>, 2005-01-31 12:22:07
0: cs:SyncSource st:Primary/Secondary ld:Consistent
ns:13441632 nr:0 dw:0 dr:13467108 al:0 bm:2369 lo:0 pe:23 ua:226 ap:0
[==>..............] sync'ed: 3.1% (7000/7168)M
finish: 1:14:16 speed: 2,644 (2,204) K/sec
1: cs:Unconfigured
Let this process run before doing anything else. What is happening is that DRBD is syncing both servers data on /etc/sda8. On my 7 GB partitions, this took about 1 hour (slow VMs, could be faster or slower on yours). Just keep running `cat /proc/drbd` until you see that the progress is complete.
We're almost there!!
14) Install and configure Heartbeat
On zimbra-1 and zimbra-2:
apt-get install heartbeat
You'll see some sort of error after the package is installed. Heartbeat doesn't install a ha.cf, haresources or authkeys file by default, you need to create these first before heartbeat will run.
On zimbra-1 and zimbra-2, create these three files:
/etc/heartbeat/ha.cf
logfacility local0
keepalive 2
deadtime 20 # timeout before the other server takes over
bcast eth0
node zimbra-1 zimbra-2 # our two zimbra VMs
auto_failback on # very important or auto failover won't happen
/etc/heartbeat/haresources
zimbra-1 IPaddr::192.168.1.10/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/opt::ext3 zimbra
Note that the above defines the primary node zimbra-1: do not change this to zimbra-2 when you make the file on zimbra-2. The last word 'zimbra' is not a typo for one of the servers: this tells Heartbeat what service to start when it does its magic.
Finally, create /etc/heartbeat/authkeys on both servers
This file needs an md5 string, which each heartbeat daemon uses to authenticate with the other. I ran a quick php 'echo md5("my password"); to get an md5 string.
auth 3
3 md5 yourrandommd5string
Protect the permissions of authkeys file on both servers:
chmod 600 /etc/heartbeat/authkeys
15) Reboot!
At this point Zimbra should fire up on zimbra-1 as normal. Do a 'df -h' on zimbra-1 and you'll see the /dev/drbd0 device has mounted /opt and if you run ifconfig, you'll see the eth0:0 entry that contains the virtual IP 192.168.1.10. You should be able to visit http://zimbra.yourdomain.com or http://192.168.1.10 and see a working Zimbra system that is running off of zimbra-1.
16) Test the failover
Shutdown zimbra-1. If you tail -f /var/log/messages on zimbra-1 as it shuts down, you should see it release drbd and heartbeat, and running tail -f /var/log/messages on zimbra-2 will show it pick up the virtual IP, mount /dev/drbd0 and kick off the Zimbra startup scripts.
When the startup scripts have finished, visit http://zimbra.yourdomain.com just like you did before and everything should appear to still be running, except now we're running off zimbra-2!
Fire up zimbra-1 again and it will take back the control from zimbra-2.
Congratulations, you have automatic failover and high availability of your Zimbra service!
Feel free to leave comments, feedbacks, or corrections in the event that I've done something wrong.. but this worked for me no problems. I hope it works for you.

Comments
Anonymous
Sat, 27/09/2008 - 09:03
Permalink
Hi, Thank you, i found this
Hi,
Thank you, i found this article very good, i would like to know if is possible to have a cluster active - active, where both zimbra servers are up.
The problem is when you have to manage the process on memory. There is a way to deal with this, maybe with a raw partition who act like quorum for the cluster nodes.
Best regards,
mig5
Sat, 27/09/2008 - 11:00
Permalink
Hi, I'm not sure if you can
Hi,
I'm not sure if you can do active-active relationship easily with the Open Source edition. I believe if you need active-active, it's easier to buy the Network Edition which supports multi-node Zimbra clusters (Red Hat Cluster Suite) in an active-active relationship. I think the Network Edition also comes with various other perks (like proper backup solutions for clusters), but I haven't tried it.
Anonymous
Wed, 19/11/2008 - 22:12
Permalink
Hi, Thank you..i think this
Hi,
Thank you..i think this article is very good..
But i faced a lot of problems when the time of installation.it will be very helpful if u give a detailed description of this configuration.Also there is any entry we need to add in /etc/fstab. my drbd.conf file is
resource drbd0 {
protocol C;
handlers { pri-on-incon-degr "halt -f"; }
startup {
degr-wfc-timeout 120; # 2 minutes
}
disk {
on-io-error detach;
}
net {
}
syncer {
rate 10M;
}
on zimbra01.domain.com {
device /dev/drbd0;
disk /dev/sda5;
address 192.168.5.35:7788;
meta-disk internal;
}
on zimbra02.domain.com {
device /dev/drbd0;
disk /dev/sda5;
address 192.168.5.36:7788;
meta-disk internal;
}
}
Lassaad
Tue, 11/08/2009 - 23:57
Permalink
Hello, thank you for this
Hello,
thank you for this howto, I have all set up but the problem is that when I do
"/etc/init.d/zimbra stop" : no takeover , the cluster stay on the primary node.
Lassaad
Tue, 11/08/2009 - 23:51
Permalink
failover problem
Hello,
thank you for this howto, I have all set up but the problem is that when I do
"/etc/init.d/zimbra stop" on the primary node : no takeover !!
mig5
Fri, 14/08/2009 - 21:52
Permalink
In step 9) we deliberately
In step 9) we deliberately remove those startup scripts with 'update-rc.d -f zimbra remove'
Starting and stopping zimbra becomes Heartbeat's job in this setup, and it's deliberate. You could shutdown heartbeat on one server and the other will take over and start zimbra.
Trevor
Wed, 28/10/2009 - 04:07
Permalink
Thanks for the article - Problems with services not starting
Hi,
I was able to complete installation by following the steps outlined in the article. However after the installation of heartbeat and restart the disk is not automount also virtual IP does not show up on Primary server. If i manually start virtual IP and DRBDDISK and mount the disk it works. I will be happy to provide some logs to anyone willing to help. If you have experienced similar issues and solved it let me know.
Thanks..
Regards,
Trevor
madapaka
Mon, 25/01/2010 - 16:09
Permalink
Thanks for the howto! I would
Thanks for the howto! I would like to clarify some things because I'm having difficulty setting things up.
I suppose that I have to create a filesystem first on my /dev/sda3 before installing zimbra, right?
Since I have two physical machines to install zimbra, I can't exactly replicate what you did but will try to simulate the same, I will install zimbra on both machines on the /opt directory, right?
In your example you did not create the drbd device using # drbdadm create-md zimbra, what's the difference between the your method and the latter?
Is this the following OK?
Install drbd on both nodes;
Configure drbd on both nodes;
Create the drbd partition (drbd create-md r0)
mount drbd0 to /opt
create a filesystem on drbd0
install zimbra in /opt of node1
unmount node1's drbd partition
make node1 as secondary and promote node2 as primary
mount node2's drbd0
install zimbra on node2
Will this approach work?
Thanks!
Anonymous
Fri, 02/04/2010 - 22:47
Permalink
Hi Everything was greate but,
Hi
Everything was greate but, when i simulate network error on primary node and connect it again after few minutes or hours I have 2 primary nodes :/. in this case I have bouth working zimbra services and virtual interfaces on the same adreses. Whot can I do to make it work?
Thansk
Munna
Sat, 10/07/2010 - 21:02
Permalink
Thanks for the great
Thanks for the great tutorial,
but my question is, is it possible now on current version of zimbra ? drbd and heartbeat ?
can we do this on current releases of linux like ubuntu 10, centos 5.5 etc. is there any drawbacks of your solution ? we cannot afford on "zimbra network edition". people all over the world looking for a perfect backup zimbra solution of "open source edition".
Thanks.
GembuL
Thu, 23/12/2010 - 17:58
Permalink
Hi Is this working on the
Hi
Is this working on the different network/subnetwork zimbra server? because i have try on my labs that a heartbeat is only work on the same network. and if i must use virtual ip to connect both of zimbra server which application i must use?
If you have an answer from my problem please send to my email address. Thank you
etessua
Mon, 27/12/2010 - 21:06
Permalink
Hello, i'm using CentOS, the
Hello, i'm using CentOS, the hostname forging is not working for me. And the -l option when installing is also not working. if i'm not mistake the -l is if you want to supply the lisence, but we are using free edition isn't it?
etessua
Mon, 27/12/2010 - 21:08
Permalink
Hello, i'm using CentOS, the
Hello, i'm using CentOS, the hostname forging is not working for me. And the -l option when installing is also not working. if i'm not mistake the -l is if you want to supply the lisence, but we are using free edition isn't it?
Anonymous
Fri, 07/01/2011 - 06:46
Permalink
I'm running into the same
I'm running into the same issue on Debian 5 with ZCS 6.0.10
Anonymous
Sat, 08/01/2011 - 06:12
Permalink
OK. I've resolved this issue
OK. I've resolved this issue and got it working.
You do have to do hostname fudging, but in a different order.
1. Build your system with the final domain name (mail.your-domain.com).
2. Install Zimbra
3. Change hostname (echo z1 > /etc/hostname) and reboot BOTH servers
4. Move /opt to different directory (eg. /opt_save) and create a new /opt
6. Setup LVM + DRBD
7. Move /opt_save/zimbra to /opt (on primary)
8. Setup heartbeat
Good to go!! : ]
Good luck
Anonymous
Tue, 28/02/2012 - 18:39
Permalink
Can you please update this
Can you please update this tutorial if not make the DNS set up a bit more clear? How would you set up the DNS if your using dnsmasq?
This isn't a very good how to if you just say what needs to be done....