IBM DS4700 Model 70, VMware ESX 3.5 and Site Recovery Manager
Just finished up a project for a customer using the above products, and it was definately an interesting (frustrating) experience. I’ve put together some quick and dirty notes on implementing this in a hope it will help someone else out there who has to set this up. The documentation is really really lacking and it was a lot of trial and error to get this all working properly.
Although the ending setup is not ideal (should use a Model 72 minimum), this does work. The documentation for this is non-existent apart from a small readme which comes with the IBM SRA itself.
Software pre-requisites
- Full fabric activation key for FC switches (we were using the IBM SAN16B-2 Brocade switches).
- Flash copy license for DR side DS4700.
- Enhanced remote mirror license for both DS4700’s.
- Ensure you’re running latest firmware on the DS4700’s (must be version 7 point something).
- Ensure you have the latest SRM and SRA from Vmware.com. Versions up until the very latest SRA from IBM had typo’s in the script – unbelievable.
Configuration Notes
- DS4700 Model 70 only has 2 FC ports per controller. A single port from each controller is required for ERM (we used port 2), which means you have to run a singular meshed fabric across your FC switches (we had 2 switches per side) to provide any level of redundancy, rather than the best practice of two separate fabrics. What you end up with is 1 port per controller carrying host traffic, and one for ERM replication. The Model 72 has 4 FC ports per controller, so this allows separated fabrics (and is therefore recommended over the Model 70).
- You will need to setup your SAN fabric to allow all your Production side ESX hosts to see each other, as well as one of the ports (Port 1 in our case, Port 2 is used for ERM) from each controller on the SAN – repeat the process for the DR side. You can have separate zones for each individual FC port on the ESX hosts to see the port on the SAN, but group the hosts together. Having a typical 1 ESX host port to 1 SAN port zoning setup gave us pathing issues on the SAN, the hosts need to see each other for whatever reason.
- Setup your VMFS LUNs and configure ERM to replicate these to the DR side. You can use different RAID levels on each end to maximize disk space if required. For example we had a mix of RAID 10 and RAID 5 on our production side with these replicating to a RAID 5 array on the DR side.
- On the DR side SAN, leave as many unconfigured disks (ie. do not include them in an array) as you have SRM protected LUNs. The IBM SRA uses an unallocated physical disk to configure Flashcopy for SRM protection plan testing – having all disks allocated to arrays even with free space available will not work, even though this is sufficient for a normal flash copy to be run through Storage Manager.
- After installing the IBM SRA, ensure the path to the Perl binaries is in your PATH system variable, otherwise you will get errors when SRM tries to call the IBM SRA scripts.
I believe those were the main road blocks we ran into. Please leave a comment if you find this useful or have anything to add!
Thanks!
3 comments so far
Leave a reply
Hi,
You state that you have a mix of RAID10/RAID5 on the production site, and that you could consider to replicate to all RAID5 for storage optimization. But beware!!! If your replication is synchronous, you will loose performance (each write to RAID10 has to be committed to the RAID5 on the DR site as well before the write completes). Caching on the DR site will help some, but not in the end as cache fills up.
Asynchronous replication… I would stay away from it (I have experience with Sun x6140 which is basically the same unit I think), and asynchronous replication leaves you nowhere when the WAN “breaks” all of a sudden… (some blocks replicated, others not)… Not what you want when you implement for DR. Just my 2cts.
Hi I should have read this before I got stuck with the same prblems. Can i ask what you did about problem number 1. I have a same situation where i have 1 port availble per controller connected to seperate switches. Do i have to merge the fabrics?
Sorry for the late responses on the above comments;
Erik, the customers requirements were to optimise the available space in the DR SAN for some other backup purposes. Agreed on your points, ideally, given the right environment, the best protection would be provided by using sync replication and matching RAID sets. The performance on the Prod side was of the most priority and therefore the decision was to use Async replication.
Imran, yes we had to break away from best practice and merge the fabrics. Unfortunate outcome, and would definately not recommend using a Model 70 to do SRM.
Thank you both for the comments!