Users expect technology to be fast and easy, which means safeguards are put in place to provide a seamless customer experience. In a perfect world, users are not aware of the complexities that go on behind the scene in order to deliver a secure and flawless experience. So how do large IT operations disguise the complex mesh of clouds, data and applications to achieve this simplistic illusion? It all starts with streamlining your resources and making smart partnerships and investments.
I am the head of IT Operations at Liberty, which is a South African based financial services company. We put a great deal of focus on stability, performance and availability of our production systems to our customers. As IT operations go, we are not a massive team, but Liberty as an organization has about 5,000 staff in actual office space. Plus, we have brokers, agents and franchises that sell our business across the country. When it comes to the IT space, we have a couple hundred people in the infrastructure domain.
IT Operations is split between five domains, one of which is enterprise monitoring. Enterprise monitoring encompasses configuration information, infrastructure monitoring, network performance monitoring, application performance monitoring and deep dive diagnostics at the code level.
In 2007, we started a journey to centralise the monitoring function. The discipline that we had several years ago was a heavy investment product and that investment in tools required people to manage and maintain it, too. Needless to say, getting coverage and visibility from a monitoring point of view was quite difficult. We did not have a huge budget, so we had to invest over several years and it took us about six or seven years to get to the point where we reached a level of maturity that was deemed to be mediocre at best.
As our existing monitoring function became unsustainable, we faced higher maintenance costs on an annual basis. These costs included paying licensing fees to get the support that we needed from the multiple providers and we needed to keep investing in training in order for us to grow our capability. We would have had to invest significantly in expanding our footprint on licenses and our skillset.
At that point, we knew we desperately needed a 'managed services' provider that could give us everything we required. So we started looking into the South African market. Our biggest driving force was around cost and capability. About a year after we started evaluating what was out there, we came across AppCentrix. Being based in South Africa, they were able to supply everything we needed, which included addressing our concerns relating to cost and sustainability.
In 2014, we officially partnered with AppCentrix and bought the capability instead of buying tools. We now have a strong business case that will save us a lot of revenue over the coming years. Our value proposition therefore strengthens because we get the capability right up front resulting in immediate benefit in the expense of the capability. Plus, the skills to deliver that capability come from the 'managed services' provisioning.
At the center of all this, was the ability to match our monitoring model that AppCentrix was able to provide via their tool, ScienceLogic. In a short span of time, we've managed to catch up to where we already were in our monitoring capability and superseded that. ScienceLogic has given us a solid step in the right direction from an infrastructure monitoring capability. It also delivers beyond expectations in our service space, with an event management discipline that now empowers us to be proactive when it comes to monitoring.
In addition to the cost saving, the occurrence of overall incidents has declined by 98 percent over the past four years. While monitoring is not the sole reason for this improvement, it is a major factor.
The model that we now use is tiered and hits on all levels of enterprise monitoring, which includes the following six levels:
We have also set up a National Operations Center operates around the clock to conduct all monitoring. The center is equipped with a video-wall made up of 23 screens, which is powered by ScienceLogic's dashboard. It is an enormous part of our event management process. The entire room is tied to our system and the room changes color based on triggered events. We use this operation center as a central dashboard for technology as part of what ScienceLogic has enabled.
The following metrics for success are split between occurrences of severity rating 1 and 2, and running from 2013 through 2016. I think you'll agree that they're hugely impressive and more or less speak for themselves.
Severity 2
Severity 1
Taken together, that's a massive 97.6 days of processing time that went back into the business. We even set a record, going 290 days without a Severity 1 Incident. Those are valuable hours that we retained, instead of losing to various IT related issues. I feel comfortable saying that the monitoring ability that ScienceLogic gave us accounts for a part of that.
In fact, monitoring is a discipline and should be acknowledged as such. The skills required to drive it are general specialist skills. The guys working in the space need to have an understanding of all of the platforms that we are running and be multi-skilled. In fact, I have been extremely selective of how I built my team. I hired people who understand the monitoring space. Generalists with in-depth knowledge of multiple disciplines are hard to come by and become invaluable to organisations. Those are the individuals who make up our monitoring team, which is a significant part of why we are so successful. Our team, armed with the monitoring capability from ScienceLogic, can act in an instant when an event is triggered and save us valuable time and precious resources.
For example, South Africa has somewhat unreliable and scarce resource such as water (which we use for cooling) and power. We are continuously taking proactive steps; one of which was to move our data center into micro Pods, which are self-contained ecosystems. Due to the nature of the environment that we work in, visibility through monitoring is absolutely essential. If we don't have coverage in our facilities and do not have visibility around our power and water supply, we would be flying blind. While everything is automated, the clock starts ticking once we lose a resource. Plus, all of these things that we have to do are under significant cost pressures.
Liberty's Strategy 2020 target in the IT Operations space is to double customer satisfaction at two-thirds of the cost. Partnering with a managing services provider that utilises tools such as ScienceLogic's has decreased our cost significantly and increased our capability and visibility.