Downtime simply refers to any time that a system is considered unavailable.
The main causes of downtime tend to fall into the following categories; issues with power and temperature, hardware failure, network failure, service/application failure and things like environmental disasters. While not technically “downtime”, niggly performance issues or not having the right equipment for the job are also contributors to the overall effects of downtime; lost productivity and potential loss of data.
When it comes to addressing these areas, there are lots of things that can be done to minimise the risk to your business and staff productivity.
Power and temperature – two things really matter here; a UPS system that is sufficient to gracefully shutdown all the servers and reliable air conditioning to control the temperature of the server environment.
One thing to consider is the amount of run time vs. cost of the UPS equipment. A lot will depend on the number of staff and the load on the equipment; this will determine the type of UPS system that is suitable. It is a good idea to get proper advice upfront and then reassess your UPS every 3 – 5 years. Batteries don’t last forever and your needs will no doubt change over that sort of timeframe.
Hardware failure – developing a system that incorporates High Availability (HA) and Clustering technology as well as new RAID 6 technology can go a long way to minimising hardware downtime. A good warranty is also essential. We recommend a minimum Service Level Agreement of 24 hour Call to Repair, 24 x 7 Onsite manufacturer’s warranty on Server equipment.
Network failure – typically relates to “Internet” failure although it is more to do with the connection to the Internet than the Internet itself. Comparing a good quality business grade connection to a residential grade connection just isn’t practical. These days most companies rely fairly heavily on a connection to the Internet to function and with more and more IT services being placed in the Cloud, it makes sense to spend extra dollars for reliability of service.
Service/application failure – risk can be greatly reduced by using HA and Clustering technology. Making sure that your systems and Line of Business applications (Quickbooks, MYOB, Visipay, CRM Systems etc) are properly and consistently backed up will help restore them to a running system again if need be.
Building burns down? You probably have no other option but do a Disaster Recovery. How long that will take in the event of a DR Scenario really depends on what sort of backup technologies are in place and what the disaster recovery plan looks like. Technology is so advanced these days that with the right backup strategy in place, your server could be fully restored in less than 10 minutes!
In summary, a lot can be said when it comes to minimising downtime. The most important strategies to contemplate involve two things: