Tag Archives: DAG

Configure an Exchange 2013 DAG on Windows Server 2012 R2 With No Administrative Access Point

DAG, Exchange 2013, High Availability

Exchange 2013 SP1 introduced support for Windows Server 2012 R2, and also introduced support for a new feature in Windows Server 2012 R2, Failover Clusters without an Administrative Access Point.  You can now create a DAG, that does not need separate IP’s on each subnet for the DAG itself.  It also no longer creates the CNO which is seen as the computer account in Active Directory.  The benefit of this feature is that you reduce complexity, no longer need to manage the computer account for the DAG, and no longer need to assign IP addresses for each subnet on which the cluster operates.  There are some downsides, but it shouldn’t affect Exchange admins much.  Mainly, since there is no ip address and no CNO, you cannot leverage Windows Failover Cluster admin tools to connect to it.  You need to leverage local PowerShell against a cluster node directly.  With Exchange, this shouldn’t be too much of a problem as almost all of the management of the cluster is handled with Exchange tools through management of the DAG itself.

In our example, we have two servers in separate AD sites that we are going to configure in our DAG:

PHDC-SOAE13MBX2

SFDC-SOAE13MBX2

We will create a DAG named SOA-DAG-2013.  Now, previously this would be the name of the CNO that Exchange would create underneath.  This is changed to essentially be a label that is stamped on all the nodes for management, but will no longer create the CNO.

If we login to EAC and navigate to Servers->Database Availability Groups, we can create the DAG by click on the plus sign:

image

Enter in the information for the DAG, and remember to specify your Witness Server.  It should be another Exchange 2013 Server in your primary datacenter location that is not also a member of the DAG.  We will specify one IP address of 255.255.255.255:

image

If we are doing this in PowerShell, the syntax is different:

New-DatabaseAvailabilityGroup –Name SOA-DAG-2013 –DatabaseAvailabilityGroupIPAddresses ([System.Net.IPAddress]::None) –WitnessServer NYDC-SOAE13CAS1.soa.corp –WitnessDirectory c:\WitnessDirectory\SOA-DAG-2013

 

image

 

Now, from here, building the DAG should have the same steps.  Lets add the mailbox servers to the DAG.  If you don’t already have Windows Failover Clustering installed, these steps will install it for you.

From the EAC:, under Database Availability Groups select the DAG name, and click on the Server with the gearbox icon:

image

 

 

 

 

 

 

 

 

 

 

Add your servers to the DAG and click Save:

 

image

From the Exchange Management Shell:

Add-DatabaseAvailabilityGroupServer -Identity SOA-DAG-2013 -MailboxServer SFDC-SOAE13MBX2

image

And your all set.  The DAG has been configured with no Administrative Access. 

If we check the properties of the DAG in the EAC we can see the IP address is listed as 255.255.255.255:

 

image

And even though we had that string in the PowerShell command, if we check the IP address in PowerShell, we only have 255.255.255.255 listed as an IP address:

 

image

Witness Server Boot Time, GetDagNetworkConfig and the pain of Exchange 2010 DR Tests

Exchange 2010, High Availability

 

So we recently had a client who wanted to perform a DR test of their Exchange 2010 DAG.  The DAG consisted of a single, all in one server in production, and a single all in one server in DR.  The procedure for this test was to disconnect all network connectivity between prod and DR, shutdown the exchange server and the domain controller, snapshot them, and then start them back up.

Now, we can all agree that snapshots and domain controllers are inherently dangerous, so its up to you to ensure that you have your ducks in a row to ensure that this doesn’t replicate back to production.  That discussion is outside this article.

Now, initially they had trouble bringing up the databases in DR, as well as many components of the DAG.  This article will walk through an example, and try to make sense of what’s causing these issues.

So, here is our setup, we have a two node DAG cluster, stretched across two sites. 

Production

PHDC-SOAEXC01 – Prod all in one Exchange Server

PROD-DC01 – Prod domain controller

PHDC-SOADC01 – Primary witness server

DR

SFDC-SOAEXC01 – DR all in one Exchange Server

DR-DC01 – DR domain controller

SFDC-SOADC01 – Alternate witness server

The DAG name is SOA-DAG-01 and the Active Directory Sites are:

Prod = PH

DR = SF

So in our scenario, we shutdown both PHDC-SOAEXC01 and PHDC-SOADC01.  This will cause the databases in DR to dismount because quorum has been lost by the DR server.

Now, in a DR “test”, we would shutdown the DR exchange server, and the DR domain controller to snapshot them.  I just want to warn you, DO NOT EVER roll a domain controller back to a snapshot in a production environment.  This is a purely hypothetical setup.  Rant over.

Now, in our case, we have snapshotted and rebooted DR-DC01 and SFDC-SOAEXC01.  When we open the Exchange Management Console, we see that the DR servers databases is in a failed state:

image

Now, lets start running through the DR activation steps.  Here is what the process should normally be:

  1. Stop the mailbox servers in the prod site
  2. Stop the cluster service on all mailbox servers in the DR site
  3. Restore the mailbox servers in the DR site, evicting the prod servers from the cluster

After step 3, the database’s should mount, but as you will see, they wont, and I’ll try to explain why.

So, step 1, lets mark the prod servers as down:

   1: Stop-DatabaseAvailabilityGroup SOA-DAG-01 -ActiveDirectorySite PH -ConfigurationOnly

You should expect to see some errors, this is completed expected because the prod site is unable, hence the –configurationonly option:

image

Now, step 2, we will stop the clustering service on SFDC-SOAEXC01 with the powershell command:

   1: Stop-Service ClusSvc

Now, step 3, we will restore the dag with just the servers in DR:

   1: Restore-DatabaseAvailabilityGroup SOA-DAG-01 -ActiveDirectorySite SF

You may get an error stating

Server ‘PHDC-SOAEXC01’ in database availability group ‘SOA-DAG-01’ is marked to be stopped, but couldn’t be removed fro

m the cluster. Error: A server-side database availability group administrative operation failed. Error: The operation f

ailed. CreateCluster errors may result from incorrectly configured static addresses. Error: An error occurred while attempting a cluster operation. Error: Cluster API ‘"EvictClusterNodeEx(‘PHDC-SOAEXC01.SOA.corp’) failed with 0x46.

Simply re-run the command again and it should complete:

image

So now, we should have the databases mounted, and we should be able to see the prod servers as stopped by running the following command:

Get-DatabaseAvailabilityGroup -Status | FL

But, behold, we get an error stating GetDagNetworkConfig failed on the server.  Error: the NetworkManager has not yet been initialized

image

So, here is the first road block, what happened is that since the DR server is one node, it uses the boot time of the alternate file share witness to determine if it is allowed to form quorum.  This is due to a one node cluster, always having cluster, and it trying to prevent split brain.  Tim McMichael does a great job of explaining it Tim McMichael Blog Post.  Essentially the boot time is stored in the registry of the Exchange Server under:

HKEY_LOCAL_MACHINESoftwareMicrosoftExchangeServerv14ReplayParameters

The Exchange Server checks if it was rebooted more recently than the AFSW, it will not form quorum.  So how do we fix?  We can start by rebooting the AFSW to see what behavior changes.

After we do so, we can re-run:

Get-DatabaseAvailabilityGroup -Status | FL

Now, we get the network and stopped servers info, but there are some entries that are in a broken state, and we get the message that the DAG witness is in a failed state:

image

Note the WitnessServerinUse field reports InvalidConfiguration

We have to re-run our Restore-DatabaseAvailabilityGroup command to resolve this:

Restore-DatabaseAvailabilityGroup SOA-DAG-01 -ActiveDirectorySite SF

Now if we re-run Get-DatabaseAvailabilityGroup –Status | FL we get an expected output:

image

Now, we see that the WitnessShareInUse is set to the alternate.

So, are the databases mounted!? If we check, they are no longer failed, but are “Disconnected and Resyncing”

image

We need to force the server in DR to start because of the single node quorum issue.  This can be done with the following command:

Start-DatabaseAvailabilityGroup SOA-DAG-01 -ActiveDirectorySite SF

Now the database is mounted:

image

So, you can see, the testing can affect what occurs with the DR test, but also the setup with the single node cluster can cause this issue.  The boot time of the alternate file share witness is also extremely important to what the node can do when it restarts.

Hopefully you find the info useful!  Happy Holidays to all!

Exchange 2010 Database’s Fail to Replicate or Seed

Exchange 2010, High Availability

 

Recently had a colleague that ran into an issue with an Exchange 2010 migration.  He could fail over the mailbox databases with no issue to DR, but that’s where the trouble started.

The production database would start to report that there was a high copy queue length that would increase as more activity occurred on the DB.  The production database pure and simple was not receiving the transaction logs from the newly activated database in DR. 

The setup was simple, two nodes, one in production, one in DR with a FSW.  The nodes were all-in-one 2010 boxes, with one NIC for MAPI and one NIC for replication.

My colleague also informed me that he had some trouble initially seeding the database.  All roads pointed to an issue with replication.  We quickly checked his replication network settings and found the following setup:

image

His DAG network had both replication networks for the separate sites under one object.  Once we moved them to their own separate networks:

image

Everything went back to normal!

Till the next time!

How to Manage a Datacenter Failure or Disaster Recovery Scenario in Exchange 2010 – Part 3

Exchange 2010, High Availability

In Part 3 of this series, we are going to discuss how to failback to your primary site, after whatever condition that caused it to go offline in the first place occurred.

So when we last left off, we have activated all of our databases in our disaster recovery site. I showed you how had to stop the mailbox servers in the primary active directory site, which then allowed us to activate our disaster recovery site.

So now, let’s say the flood that occurred in the New York site has been fixed, all the water’s been removed, and thankfully, it did not damage our equipment. We are ready to move everything back to NYC. We begin by powering on the servers in NY. Since our DAG is in DAC mode, the NY servers do not try to mount their copies of the databases. Instead, the begin copying over any log files so as to bring their copies of the databases up to date with the DR site. Wait until all the database’s are reporting a status of healthy:

Now, remember, we told the DAG that the NY servers were down with the Stop-DatabaseAvailabilityGroup command from Part 2:

Notice how NYMB01 and NYMB02 are both listed as Stopped Mailbox Servers. You will get this error in the console if you try to activate the database on one of those two servers:

What we have to do is start these servers in the DAG, making them viable copies for activation again. We do this with the Start-DatabaseAvailabilityGroup command. Again, we could use with the –MailboxServer command and specify each server, separating them with a comma, or we can specify the whole site with the –ActiveDirectorySite option as below:

Start-DatabaseAvailabilityGroup DAG1 –ActiveDirectorySite NYC

Now if we run the Get-DatabaseAvailabilityGroup command, all servers are listed as started

Now, Microsoft recommends dismounting the databases that are going to be moved back to the primary datacenter site. This will mean that you will need to get a maintenance window to perform this action, as the databases will be offline. Keep in mind that you do not NEED to dismount the databases; it is just recommended by Microsoft.

Now, we can activate the Database. You can do this either by using the shell with the Move-ActiveMailboxDatabase MDB01 –ActivateOnServer NYMB02 command:

Or through the EMC with the Activate a Database Copy Wizard:

Now, the database should be activated. One issue that I have seen pop up is that the Catalog Index is corrupted. This is the index for the Full Text Index Search. If this is the case, you may have to reseed the Catalog Index with the following command:

Update-MailboxDatabaseCopy MDB02NYMB02 –SourceServer DRMB01 –CatalogOnly

This command will reseed the content index from the server DRMB01. You should now be able to activate the database copy.

Continue by moving all active database copies to their respective servers. If you dismounted the databases as stated above, now mount these databases.

Now, our clients are still connected, but they are still connecting through DRHT01.nygiants.com because it is set as the RpcClientAccessServer:

17-Dec09 15.08

We simply need to run the command Get-MailboxDatabase | Set-MailboxDatabase –RpcClientAccessServer NYHT01.nygiants.com to change this setting over:

If we check the properties of the databases, we see that NYHT01.nygiants.com is again set as the Rpc Client Access server:

Now, our Outlook clients will be connecting through NYHT01 for their access:

Well, that’s it!

In this three part series, I should you how to activate the backup datacenter, in the event that the primary datacenter became unavailable. We discussed the theory behind the process, mainly Datacenter Activation Coordinator, the actual work needed to do it, as well as how to failback when the primary site came back online.

If you have any questions, feel free to email me at ponzekap2 at gmail dot com.

How to Manage a Datacenter Failure or Disaster Recovery Scenario in Exchange 2010 – Part 2

Exchange 2010, High Availability

 

In the first article of this series, we went over some of the premises of Exchange 2010, Database Availability Groups or DAG’s, and Database Activation Coordinator.  We discussed our test environment, as well as how, in theory, an Exchange 2010 DAG handles the failure of a datacenter.

In this article, I’ll show you how to actual activate your disaster recovery site, should your primary site go down.

One of the first thing’s we need to take into consideration is how the clients, most importantly Outlook, connect to the DAG.  We spoke in the first article how there is a new service in Exchange 2010 Client Access Servers, called the Microsoft Exchange RPC Client Access Service.  Outlook clients now connect to the Client Access Server, and the Client Access Server connects to the Mailbox Server.  This means Outlook clients don’t connect directly to the Mailbox Server’s anymore.  This becomes significant in a disaster recovery situation.  

Lets look at the output of the command:

Get-MailboxDatabase | Select Name,*rpc*

09-Dec02 18.20 

The RpcClientAccessServer value on a particular Mailbox Database indicates that connections to this database, are passed through to the Client Access Server listed.  So, for all these databases above, all Outlook connections have to go through NYHT01.nygiants.com.  (If you have more than one Client Access Server per site, which you should for redundancy and load balancing, you can create a cluster using either Windows Network Load Balancing, or a hardware load balancer, and change this value to point to the cluster host name).  Lets look at our Outlook Client, and we notice that all connections are being passed to NYHT01.nygiants.com:

09-Dec03 18.23

Alright, now, we are ready to start failing over servers!  My test user, Paul Ponzeka, is located on a mailbox database named MDB02 which is running on server NYMB02:

09-Dec04 18.27

So, to simulate a datacenter failure, we’ll just pull the power on ALL NY servers:

NYDC01

NYHT01

NYMB01

NYMB02

NY-XP1 (client machine)

So now, all of our NY machines are off, lets check the status of the Databases on our DR servers located in Boston:

09-Dec05 18.40

Well, that’s not good huh?  Notice how the DB’s for the two copies in NY are listed as ServiceDown under copy status?  Also note that DRMB01, the Mailbox Server in Boston’s copy of the database is Disconnected and Healthy.  The reason the DB is not mounted, is because of the DAC mode enabled on the DAG, which we discussed in Part 1 of this series. DRMB01 dismounts ALL its Mailbox Databases because there are not a majority number of DAG members available, this it can’t make quorum. It dismounts to prevent a possible split brain scenario, but in this case, we REALLY need to get this activated.  How do we do this?  What we need to do is remove the NY server members as being active participating members of the DAG.  We do this through the Exchange Shell.

If we note the current status of the DAG with the command:

Get-DatabaseAvailabilityGroup | select Name,*server*

09-Dec06 18.47

Notice that the Servers value lists DRMB01, NYMB01 and NYMB02 has servers in the DAG, and lists them again as StartedMailboxServers?  Well, we need to tell the DAG, that NYMB01 and NYMB02 are no longer “started” or operational.  We need to use the following command for that:

Stop-DatabaseAvailabilityGroup.

Our command can specify each server that’s down, one by one:

Stop-DatabaseAvailabilityGroup –Identity DAG1 –MailboxServer NYMB01,NYMB02

Or, since we lost the entire NY site, we can specify that the entire NY site:

Stop-DatabaseAvailabilityGroup –Identity DAG1 – ActiveDirectorySite NYC

Now, since our Mailbox Servers in NY are actually unreachable, we want to specify the –ConfigurationOnly option at the end of this command.  Otherwise, the command attempts to actually stop the mailbox services on every mailbox server in the NYC site, and causes the command to take an extremely long time to complete:

09-Dec07 18.55

Now, if we re-run the command:

Get-DatabaseAvailabilityGroup | select Name,*server*

09-Dec08 18.56

We notice that the NY mailbox servers, NYMB01 and NYMB02 are both now listed as Stopped Mailbox Servers.

Next, ensure that the clustering service is stopped on DRMB01:

09-Dec09 18.57

Now, we want to tell the DR site, it should restart with the new settings (both the NY servers missing):

Restore-DatabaseAvailabilityGroup –Identity DAG1 –ActiveDirectorySite DR

09-Dec10 18.59

You will see a progress bar, indicating its adjusting Quorum and the cluster for the new settings.  Don’t be alarmed if you see an error regarding the command not being able to contact the downed mailbox servers.

If we return to the Exchange Management Console, all our databases in the DAG have been mounted on DRMB01!

09-Dec11 19.01

Great right?  But our clients are still having trouble connecting.  What’s the problem?

09-Dec12 19.02

The reason is that NYHT01.nygiants.com is still listed as the RPC Client Access Server for this database:

09-Dec13 19.05

We have two choices.  First, is to change the DNS record for NYHT01.nygiants.com to be a CNAME for DRHT01.nygiants.com.  Or the second, and faster method, is changing the RPC Client Access Server to be DRHT01.nygiants.com with the following command:

Set-MailboxDatabase MDB02 –RpcClientAccessServer DRHT01.nygiants.com

09-Dec14 19.07

To do it for every DB that you failed over is simple:

10-Dec01 08.54

Now, back to the Outlook client and WHOILA!

09-Dec17 19.11

Now all your messaging services are back and running in your Disaster Recovery site, with limited downtime for your end users.

In this article we discussed how to fail over a datacenter to your backup or disaster recovery datacenter, should your primary go offline.

In the next and final part of this series, I’ll show you have to fail back to the primary datacenter site, which some admins think is even more terrifying than failing over!

Stay Tuned!

Creating a Database Availability Group in Exchange 2010 – Part 3

Exchange 2010, High Availability

 

In Part 2 of this series, we created the Database Availability Group, and added both NYDAGNODE1 and LNDAGNODE1 to it.

In Part 3 of this series, we are going to configure the network’s properly, and create some database’s for the DAG.

As we noted last time, by default, all networks for every node in a DAG is configured for replication.  We only want replication to occur over certain networks, mainly 172.16.1.0 for NY and 172.17.1.0 for London.  If we navigate to Organization Configuration –> Mailbox and select the Database Availability Group tab and select DAG01, we see all the networks listed.

23 Jun. 30 20.55

Just a side note, if you right click the label DagNetwork01 for example, you can rename it to something more descriptive.

24 Jun. 30 20.57

Now, for the two production network’s, when your in the property’s page, un-check the “Replication Enabled” check box:

25 Jun. 30 20.59

Now, it states for both NY and LN Production, that replication is disabled.  This will ensure that all replication occurs over a dedicated network.

26 Jun. 30 21.00

Now, it’s time to add some database’s to the DAG!  Move on to the Database Management tab.  You will note two database’s here, both of which are the default one for each server. 

27 Jun. 30 21.01

We can add these existing database’s to the DAG, by right clicking on each of them and selecting “Add Mailbox Database Copy”.  You will select any free server to add a copy, in our case:

28 Jun. 30 21.03

Select Add, and it will add NYDAGNODE1, as a replica for the Database.  Note the preferred list sequence number.  This indicates that NYDAGNODE1’s copy of this database, should be the second database activated, should something happen to Preferred List Sequence Number 1, which is the original copy on LNDAGNODE1.

The powershell command we could have run is listed:

29 Jun. 30 21.04

Now note, we have one database that’s listed as Copy Status Mounted, and one who’s Copy Status is Healthy.  The Healthy means it’s not in production and is a replica. 

30 Jun. 30 21.06

Note how it lists the servers that are hosting the database, as well as the Copy Queue Length, Replay Queue Length, as well as the Preferred List Sequence Number.  The copy queue length is how many transaction logs are waiting to be copied to the node, the replay is how many are waiting to be played into the database on that node, and the list sequence is what is the preferred next copy of the database Exchange should activate, if the currently mounted one becomes unavailable.

Adding a new mailbox database is very similar.  You create a new mailbox database, and select a node to host the first copy:

31 Jun. 30 21.09

And then just add extra copies as we did above.  All servers that are in the same DAG should have the same drive letter or mount point configuration.  This is because all copies will have the same path to the EDB File, as well as the Transaction Log and System Files path’s.  Also, since Mailbox Database’s are objects of the organization now, you need to ensure that their names are unique throughout the entire Exchange Organization.

So, that’s it for the third part of this series.  In this part, we configured the networks for replication, and we added copies of existing database’s, as well as created new database’s and copies into our DAG.

In the next part, I’ll show you how to fail over to different copies of the Mailbox Database’s, and it’s impact on the end user.

Creating a Database Availability Group in Exchange 2010 – Part 2

Exchange 2010, High Availability

 

In Part 1 of this series, we went over the basic concepts of the Database Availability Group, or DAG, and then went into how to set up the Networking for the DAG.  In this next section, we’ll cover how to create the DAG, and then add servers to that DAG.

The first thing we need to do, is actually create the DAG.  In the Exchange Management Console, under the Organization Configuration-> Mailbox, navigate to the Database Availability Group tab:

10 Jun. 30 19.35

Click on the “New Database Availability Group” Action.  You’ll be presented with the following screen:

11 Jun. 30 19.36 

  1. Database Availability Group Name – This is just the name of the DAG.
  2. File Share Witness Share – This is the UNC patch of a file share witness, most likely on an HT server.  This is used if there an even number of Servers in the DAG for a majority vote
  3. File Share Witness Directory – This is where the share is located on the server who is hosting it.  It will be created for you automatically.
  4. Network Encryption and Network Compression we’ll leave at the default.

With a Hub Transport server installed on NYDC01, and wanting the share to be from a folder called “DAG01” on the C drive of that server, our screen will look like this:

12 Jun. 30 19.40

After hitting next, you’ll see the powershell command that could have been run:

New-DatabaseAvailabilityGroup -Name ‘DAG01’ -FileShareWitnessShare ‘\NYDC01DAG01’ -FileShareWitnessDirectory ‘C:DAG01’

Now, you’ll have a DAG created, but with no member servers in it:

13 Jun. 30 19.43

Now, lets add NYDAGNODE01 to it.  A couple things that should be noted.  First, DAG’s utilize the Windows Server Failover Cluster feature to be installed.  If when you go to add a node to the DAG, if this isn’t installed, the command will run it for you, it will just take a little bit longer.  The second issue is that we are using the Beta release of Exchange 2010.  There seems to be an issue with the Exchange Console, being able to remotely initiate the installation.  To get around this when using the Beta, just make sure to install the Windows Failover Clustering Feature from Server Manager yourself on all the nodes.  This will also help to speed things up.

14 Jun. 30 19.47

Okay, so on to adding the first Node to the DAG.  When you add the first node to a DAG, the DAG get’s assigned an IP address.  If you do this through the Exchange Management Console, the DAG will retrieve an address through DHCP.  I’m not a huge fan of this, so I like to use the Exchange Management Shell, because you can statically assign an IP address to the DAG.  I’ll show you both way’s though.  For the Exchange Management Console, Navigate to Organization Configuration –> Mailbox and select the Database Availability Group tab.  Here, you will see DAG01 listed, the DAG we created before.  Right click it, and select “Manage Database Availability Group Membership”, you’ll be presented with this screen:

15 Jun. 30 19.59

Now, select the green Add button, and then select NYDAGNODE1, and select OK.

16 Jun. 30 19.59

You could now select manage.  This would ensure the server had Failover Clustering installed, if it didn’t it would install it, and then add it to the DAG.  It would also retrieve an IP address from a DHCP server.  We won’t finish this, we’ll do it in the shell. 

The command is really simple. 

Add-DatabaseAvailabilityGroupServer -Identity DAG01 -MailboxServer NYDAGNODE1 -DatabaseAvailabilityGroupIpAddresses 10.1.1.3

18 Jun. 30 20.04

This will add the server NYDAGNODE1, to the DAG, DAG01 and assign the DAG IP address 10.1.1.3.

We let the command run, and it can take some time, you’ll see a command similar to the below as it creates the cluster and adds the server to the DAG:

19 Jun. 30 20.05

Once the command finishes, you’ll see NYDAGNODE1 listed as a member server:

20 Jun. 30 20.07

If we now ping that IP, we see that we are getting a successful return:

21 Jun. 30 20.08

Now, add the second node, LNDAGNODE1.  It works the same way as above for the Console, or the shell.  If you use the shell, you can now omit the –DatabaseAvailabilityGroupIPAddress command.  (Remember to log on locally to LNDAGNODE1, as the Beta fails when trying to do it remotely.  Also, it seems that you need to use the Exchange Management Shell (Local) icon to add the second node successfully) The end result should look like this:

22 Jun. 30 20.33

If you note, there are now two Member Servers in the DAG, and in the bottom half of the screen, it notes the networks, and their status.  Note, by default, ALL of the networks are configured for replication.  We’ll configure this differently in the next part.

In this part, we created a DAG, and added two members to this DAG.  In the third part of this series, we’ll configure the replication networks, and create some database’s and set them up for replication!

Creating a Database Availability Group in Exchange 2010 – Part 1

Exchange 2010, High Availability

 

As you may or may not have heard, Microsoft has announced the next version of their messaging suite, Microsoft Exchange 2010, will be available later this year!  The new version of Exchange hosts many improvements and feature additions on top of Exchange 2007.  One of the most exciting ones announced, was a feature called Database Availability Groups, or DAG’s.  In this four part series, we’ll go over the concepts of DAG’s, and how to get one working and test it out. 

So, what is a DAG?  A DAG is the evolution of the CCR and SCR functionality that was introduced in Exchange 2007.  CCR allowed you to keep two copies of your database’s in a cluster, protecting against both server failure, and a corruption of the database.  SCR allowed you to add site resiliency to your Exchange design, by replicating data to your disaster recovery site, and activating it if needed.  CCR and SCR have now been rolled into the DAG feature.  The best part about it, is most of the legwork, as well as the activation of the data, is automatic!  It still uses the concept of log shipping for the replication, although its been much improved.  Let’s get into a little bit of how it works.

The first thing you need to know, is in Exchange 2010, the storage architecture is different than that of Exchange 2007.  There are no more storage groups to start.  Transaction logs, checkpoint files, are all based off of the mailbox database now.  Microsoft was moving away from the storage groups, especially when you consider the requirement for any type of replication in Exchange 2007 required a maximum of one database per storage group.  The next big change is that database’s are no longer objects of a server, but objects of the Exchange organization itself.  What exactly does that mean, well, take a look of this screen shot in the Exchange 2010 management console:

 

01 Jun. 29 20.38

If you notice, I am under the Organization Configuration node, not the server configuration node.  The picture shows two database’s, each hosted on different servers.  If you notice on the bottom half of the screen, the console lists the Database Copies that make up this particular database.  A DAG consists of multiple copies of a set of database’s, that can be activated as the active copy at any time.  You can have up to 16 servers in a DAG, meaning you can have up to 16 separate copies of one database!  For example, you could have two servers in your main datacenter, each with a copy of one database for high availability in your main site, and then a third copy of the database in your disaster recovery site, in case you lost your main datacenter.  Members of a DAG do not have to be members of the same AD site, like stretched 2008 CCR clusters did.  Each of these can be activated at any time, automatically if you have a failure, or manually by running some commands.

The last major point, is that no client connects directly to a mailbox server anymore, including outlook.  Outlook clients connect to a Client Access Server, just like a POP or IMAP client does to connect to it’s mailbox.  This allows for incredible quick failovers (30 seconds or less), of the Outlook client to a new copy of the database. 

So now that we have an idea of the high level concept, let’s take a look at actually setting up one.  Here is my lab environment.  I have two separate AD sites.  New York has two subnets, 10.1.1.0/24 and 172.16.1.0/24, and London has two subnets, 192.168.1.0/24 and 172.17.1.0/24.  In NY, production traffic will occur over the 10.1.1.0/24 network, and replication and heartbeat over the 172.16.1.0/24.  In London, production is 192.168.1.0/24, and replication and heartbeat 172.17.1.0/24.  Now, since these are two separate sites, both replication networks need to be able to contact each other.  This means both networks need to be routable to each other, which in our case they are.  You can use a stretched VLAN, but is a much more complicated scenario, for no true benefit.  In each site, I have a single Domain Controller, that is also a Client Access Server and Hub Transport server, as well as one machine with just the Mailbox Role installed.  It should be noted, one of the coolest feature of the DAG, is that the mailbox role does not have to be installed by itself for it to be part of a DAG.  You can have any combination of roles installed, and it will still work EXACTLY the same.  Below is a Visio of the setup:

DAG_Diagram

All of the Exchange 2010 is installed, as you do NO customization during the install, all is done after.  This means you do not have to re-install Exchange if you decide down the rode to make it part of a DAG. Let’s take a look at the network configuration.  First, the NY server. 

 02 Jun. 29 20.58

I have two NIC’s, one labeled “Client” and one labeled “Replication”.  The client NIC, is configured as normal, with an IP, Subnet Mask, Gateway, all the regular stuff.  The replication NIC should only be configured with an IP and subnet, NO DEFAULT GATEWAY:

03 Jun. 29 21.00

Now, select the advanced button, and select the DNS tab.  At the bottom, un-select the box to “Register this connection’s address in DNS”:

04 Jun. 29 21.01

Next select the WINS tab and select the radio button to disable NetBIOS over TCP/IP:

05 Jun. 29 21.01

After this, select OK to save your settings and return to the Network Connections screen.  Select Advanced->Adapters and Bindings.  Make sure your production or “Client” NIC is listed above “Replication”:

06 Jun. 29 21.03

Now, you may be wondering about the default gateway missing on the heartbeat network.  If you add a default gateway on two different NIC’s, windows provides you with a warning:

07 Jun. 29 21.04

Hmm, seems like this most certainly pertains to us.  Also, DAG’s still use the Windows Failover Clustering feature of Windows Server 2008.  Have a configuration with a default gateway on the replication or heartbeat NIC is not supported, as very odd behavior can be exhibited.  So, then the question is asked, well the network’s are routed, how do we tell the replication NIC on one node, how to get to the replication networks of the other nodes?  For this, we add static routes to the individual server’s routing tables.  Tim McMichael had a great article about this, and you can read it here

So, on the NY node, we want it to contact the LN node’s replication network of 172.17.1.0/24 on its replication network of 172.16.1.0/24.  The gateway on the NY side is 172.16.1.254, so we run the following command:

route add 172.17.1.0 MASK 255.255.255.0 172.16.1.254 –p

08 Jun. 29 21.11

The –p makes it consistent across reboots.  We can check if it was successful with the route print command:

09 Jun. 29 21.13

So now, all replication and heartbeat traffic should pass through the specific replication NIC, over the replication network, to the replication NIC of the London node.  Repeat this step for ALL your replication networks, on all nodes.  For the London node, with a gateway on the London replication network of 172.17.1.254, the command back to NY would be:

Route Print 172.16.1.0 MASK 255.255.255.0 172.17.1.254 –p

Okay, that does it for part 1 of this series.  We went over the basic concepts of the DAG, and how to set up the networking for it.  In the next section, we’ll go over how to create the DAG, and add nodes to it.  Stay tuned.