Major Systems Upgrade

We have been making major changes at work the last six weeks. We replaced both of our old Dell Power Edge 4600 servers with three new Dell PowerEdge 2900 Servers. I went from having half a terabyte of storage to having two terabyte of storage. Two of the servers are running VMware ESX 3.5 and the other server is my Microsoft SQL server. When we purchased the new servers we did not get any tape drives on them because we wanted to change our backup method during the change.

The basic server layout is:
Physical Server 1 (PowerEdge 2900)

  • VM 1: Domain Controller
  • VM 2: File and Print Server
  • VM 3: Application Server

Physical Server 2 (PowerEdge 2900)

  • VM 4: Intranet Server
  • VM 5: Domain Controller (Primary)
  • VM 6: Application Server

Physical Server 3 (PowerEdge 2900)

  • ERP and MS SQL Server

Physical Server 4 (PowerEdge 4600)

  • Backup Server

For backups we are using Acronis TrueImage Echo Server to backup VM 2, VM 4, VM 5 and the Physical Server 3. We are creating full images on Sunday when to network to slowest and pushing them to the backup server. This process is taking about one hour. Monday through Friday we are doing differential backups on the save four machine and pushing them to the backup server. On the SQL server we are also doing log file backups every 30 minutes. These files are created locally and then copied to the backup server. The differential backup is only taking about five minutes total for the four servers. On Saturday we are using the ability of TrueImage to merge the full image from the previous Sunday and the last differential image from Friday into one file and then removing all the pieces from the week. At this point we have one image for each of the four servers that is current. All of the nightly processing done on these four servers is handled by a VBscript. (Note: The script has been saved as a text file for security on the web site.) Each night the backup server backs itself up to tape which will include all the image file from the other servers. We are not backing up VM 1 because it is a mirror of the other domain controller and in the event that we had to restore everything, I have never had much luck getting to domain controllers to resync after. VM 3 only changes when one of the applications on is updated. So after any application changes we make a snapshot of the VM and burn it to DVD to store off site. VM 6 has one data directory that changes so I pull that directory each night when the backup server backs up, other wise it is treated the same as VM 3.

These changes have made a world of difference to our nightly processing. When everything was going to tape it was taking almost four hours total for all the servers to backup and the one time we did have a major hardware failure it took sixteen hours to get everything up and running. We did a test run after everything was up and running, and the full restore took under two hours have things running.

I also finally got and answer to my question about our corporate pain threshold for data loss in a major failure. I have always been working to keep my exposure to less than four hours. Well it turns out the corporate standard is one WEEK. I was shocked to hear that. I am pretty sure that I will have NO problem meeting that requirement.