Saturday, March 29, 2008

Setup a Standby Continuous Replication in Exchange 2007 SP1


Overview:

----------
Standby Continuous Replication (SCR) is a new feature in Exchange 2007 SP1. It has been designed to provide a solution to complete site failure. It does this by utilising the existing Exchange replication service (asynchronous replication) used by Cluster Continuous (CCR) and Local Continuous replication (LCR) to replicate data to a target which is in a different site.

*SCR doesn’t rely on clustering, so the configuration of offsite copies has none of the limitations that clustering can enforce. Another important feature of SCR is that is allows for multiple copies of data to be made by allowing each source storage group to have multiple targets.

A few key things changes have been made to the replication technology compared to its use in CCR and LCR. In SCR there is a lag time before logs are played into the SCR target database. In fact, before the database is even created 50 logs must have been copied across to the target before a database is even created. On top of this there is a parameter ReplayLagTime which by default is 24 hours which specifies that SCR will wait the configured amount of time before any logs are played into the database. Of these two delays, the ReplayLagTime is configurable but the 50 log delay is hard coded!

Finally before moving onto look at what will be tested I will outline the perquisites for SCR:

· Source and Target data paths must match
· Source and Target installation paths must match
· There must be a 1-1 mapping of Storage Groups and Databases
· Source and Target Operating Systems must match
· Source and Target must be in the same AD domain
What is being tested
In this document, I will test the following scenarios:

· Setup of SCR
· A planned activation
· An emergency activation after a failure of the source
· Re-initialization of the original site
SCR Setup
To begin with I setup a new Storage group on the source machine and create a database within. I then moved a few test mailboxes to it.
To enable and SCR copy use the following command (replacing your Storage Group and Server names where appropriate:

Enable-StorageGroupCopy –Identity SCRSourceSG1 –StandbyMachine SiteBSCR1.gaots.co.uk –ReplayLagTime 0.0:10:0

You will notice the ReplayLagTime I set ensures that there is only a ten minute lag. Also note the placement of periods and colons in the value which should be entered as shown.

Having enabled SCR I needed to make some traffic to generate 50 log files as can be seen in Figure 1! Once I had done that, just as expected, the target database was created.

Figure 1 – The Storage group path showing the created logs

Of course; to start with the database remained small however, after 10 minutes the logs started to play in. It should be noted here that only some of the logs play in. There is always a buffer of the last 50 logs which will not play in as can be seen in Figure 2; this final 50 logs would be manually played in as part of activation.

Figure 2 shows how to check the health of the SCR copy by running the following command:

Get-StorageGroupCopyStatus – Identity SCRSource1 –StandbyMachine SiteBSCR1.gaots.co.uk

Figure 2 – The output from the Get-StorageGroupCopyStatus command

So having setup SCR and verified that it is replicating correctly let’s move onto activation.
SCR Activation
Activation where the source still exists
The first type of activation I will demonstrate is simply activating a single failed database where the original DB and server still exist. This would either be a planned failover or one where corruption has occurred. This procedure makes use of the Exchange database portability feature enabling the re-homing of a users’ mailbox to a new server, in this case the target SCR copy.

The first step is to create a Storage Group (SG) and Database (DB) on the target server. This should be created in a new path (not one already used for a target copy). Having created the SG and DB, mount the DB and then dismount it. Once you have dismounted the DB delete all files from the SG and DB paths.

New-StorageGroup -Server 'SITEBSCR1' -Name 'SCRRestore1' -LogFolderPath 'C:\SCRRestore1' -SystemFolderPath 'C:\SCRRestore1'

New-MailboxDatabase -StorageGroup 'SITEBSCR1\SCRRestore1' -Name 'SCRRestoreDB1' -EdbFilePath 'C:\SCRRestore1\DB\SCRRestoreDB1.edb'

Mount-Database -Identity ‘SCRRestoreDB1'

DisMount-Database -Identity ‘SCRRestoreDB1'

Delete all files in the folders below:

· C:\SCRRestore1
· C:\SCRRestore1\DB (also delete the catalog folder)

Having carried out the preparation steps, you must first dismount the production DB if it not already dismounted.

DisMount-Database -Identity ‘SCRSourceDB1'

Once the source SG is dismounted you must enable the target database for mounting and ensure all log files are copied across. This is done using the following command:

Restore-StorageGroupCopy SiteAMB1\SCRSource1 –StandbyMachine SiteBSCR1

If you find that you get any errors investigate them carefully as it usually means that a log cannot be copied across, which will mean that data loss will occur. For an example see Figure 3.

Figure 3 – Showing possible errors
So long as everything copied correctly, then you will be left at the command prompt with no message. At this point you must perform a few more steps before getting the DB back online.

First verify whether the target copy of the DB is in a clean shutdown state. This is done using ESEUTIL.

ESEUTIL /mh c:\SCRSource1\DB\SCRSourceDB1.edb findstr State

Notice the pipe to the findstr cmdlet which lets us search for on the info on shutdown state which we require. As you can see in Figure 4, the database is in a Dirty Shutdown state.

To rectify the dirty shutdown, we use ESEUTIL again as follows. First change to the SG path location:

Cd c:\SCRSource1

Next run a recovery on the DB:

ESEUTIL /r E02

Note that you would use whatever log version you have for that storage group, be it E01 through E50!

Having performed the above steps re-check the database shutdown state and it should now be clean.

Figure 4 – The database is in a dirty shutdown state and is then repaired

Next we must make use of the SG and DB that we created right at the beginning of this process. We will basically point the SG and DB to the path where the SCR copy is held as follows:

Move-StorageGroupPath SiteBSCR1\SCRRestore1 -SystemFolderPath c:\SCRSource1 –LogFolderPath C:\SCRSource1 -ConfigurationOnly

Move-DatabasePath SiteBSCR1\SCRRestore1\SCRRestoreDB1 -EdbFilePath C:\SCRSource1\DB\SCRSourceDB1.edb –ConfigurationOnly

Finally before mounting the DB we must set it to allow it to be overwritten during a restore as follows:

Set-MailboxDatabase SiteBSCR1\SCRRestore1\SCRRestoreDB1 -AllowFileRestore:$True

This done we can mount the database using the command below:

Mount-Database –Identity SiteBSCR1\SCRRestore1\SCRRestoreDB1

The database is now restored and available for use. All we have to do now is ensure that the users know where to access it. This is done by moving the users mailbox again using the ConfigurationOnly parameter so as only to update the configuration. The complete command shown below will get all the mailboxes from the Source database and will move all but the System mailboxes across to the Target database.

Get-Mailbox -Database SiteAMB1\SCRSource1\SCRSourceDB1 where {$_.ObjectClass - NotMatch '(SystemAttendantMailboxExOleDbSystemMailbox)'} Move-Mailbox - ConfigurationOnly -TargetDatabase SiteBSCR1\SCRRestore1\SCRRestoreDB1

Having completed the above steps the next stage is to test that users can now access mail again. It should be noted that they will have to exit and re-open Outlook for the setting to take effect.

Activation after complete site/server loss
As I showed above, SCR can be used as a rudimentary failover solution perhaps if major work is being undertaken at a site, or on WAN links. However, it is perhaps more likely to be used in the event of a complete site or server failure. This scenario requires a slight adjustment to the process described above in particular, the use of the force switch.

When you get to the point of running the Restore-StorageGroupCopy command, you will find that it will produce an error shown in Figure 5. This is because it cannot contact the source server to check that the source DB is dismounted.

Figure 5 – Restore error because the source is unavailable

At this point, having verified the source really is gone, run the command below to force Exchange to restore:

Restore-StorageGroupCopy -Identity sitebscr1\scrrestore1 -StandbyMachine siteamb1
-Force

Exchange will notify you that data loss is expected, as it was not possible to copy the final logs. At this point you must continue the restore procedure by verifying the shutdown state of the database. Again you will notice the DB is in dirty shutdown state. This time however, to recover it you will need a slightly different ESEUTIL command:

ESEUTIL /r E00 /A

The /A tells ESEUTIL that this is a soft restore where log files may be missing.

After that follow the remaining steps above complete restore.

One other thing to note is that the loss of the original server also has repercussions with regards to updating user profiles. Now only those with Outlook 2007 will get automatically redirected using the Autodiscover service. Those on Outlook 2003 will no longer be able to contact the original server for redirection and will therefore need updating manually.

Note: The above procedure could easily be expanded to cover all stores on a failed server.
Restoring SCR redundancy after failure is recovered
After you have dealt with the failure of a site, server or database you will want to get redundancy back and restore your Exchange organisation to the original configuration. The steps below outline how to undertake this.

The first stage to resetting configuration to the original setup is that you need to setup SCR in the opposite direction in order to get the database synchronised back to the original source server. To do this, first remove any trace of the original SG and DB from your original source server by deleting the SG and DB and then removing their files.

Next follow the procedure above to setup SCR. Having setup SCR and verified that it is functioning, reseed the database to ensure that it is as up to date as possible using the commands below:

Suspend-StorageGroupCopy - Identity sitebscr1\scrrestore1 -StandbyMachine siteamb1

Note: the next command must be run on the target machine.

Update-StorageGroupCopy -Identity sitebscr1\scrrestore1 -StandbyMachine siteamb1
-DeleteExistingFiles

Resume-StorageGroupCopy -Identity sitebscr1\scrrestore1 -StandbyMachine siteamb1

Finally check that SCR is running correctly using the command below:

Get-StorageGroupCopyStatus -Identity sitebscr1\scrrestore1 -StandbyMachine siteamb1 fl

At this point your target copy (on the original source machine) will be almost up to date, which means that you are now ready to activate following the procedure laid out above.

Having completed activation, users will now be using the DB on the original source server and you should setup SCR again to replicate to the original target server.
Summary
I hope this document gives a clear idea of how to setup and use SCR. For more information about some other commands available, see the following link:
http://technet.microsoft.com/en-us/library/bb676502.aspx