You have deployed a plethora of monitoring and service management solutions. Yet do you have a good handle on
- What are the hot button issue(s) at any given time?
- What you need to prioritize?
- What is the best way to react to a situation?
Better yet, what if you can know outages even before they happen!
The next generation predictive IT Operations Analytics (ITOA) solutions are aiming to do just that.
predicts that in the coming years, Global 2000 companies will deploy IT Operations Analytics Platforms as a central component of their architecture for monitoring critical applications and IT services.
Here are some interesting usecases that the next generation ITOA solves by leveraging Data Science and BigData.
Multiple sources can create a flood of noisy alarms. ITOA intelligently clusters the IT alarms/alerts into high-level incidents. This clustering reduces the noise and helps spot critical issues faster.
The goal is to correlate, contextualize and create clusters of related alerts known as “situations”. Managing one or two situations is better than trying to manage thousands of disparate alerts. Once situations are created, all the relevant stakeholders (Dev, Ops, DBA, Sys Admin, etc.) can then be invited within a virtual war room to collaborate and resolve the incident.
Once you have correlated your alerts into a few “situations”, how do you prioritize what your team should focus? Manual prioritization is subjective and often devolves into FIFO which is not optimal.
Enter Data Science.
Incidents are automatically prioritized by analyzing a comprehensive history of:
- Severity rank for incidents
- Level of disruption/impact a situation has caused in the past
- End user experience
- Client importance
- Past escalations
- Incident duration
- And much more
This will help your IT Operations to solve the problems that really matter.
To resolve an incident in time and cost effective manner, the right resources in terms of people in past relevant resolutions, KB articles etc., needs to be identified efficiently.
Once the incidents are correlated and a score is assigned, IT Operations Analytics applies ML (Machine Learning) techniques to
- Automate routing of tickets: Predict which group or individual the incident should be assigned to
- Recommend response: Shows up the best template response to use based on the incident/situation
- Automate reply: Automatically respond to incident
- Automate root cause analysis
Machine Learning Algorithms are also used to generate Bayesian Networks for Incident Duration Prediction.
Root Cause Analysis:
ITOA automatically analyzes all changes that occurred since the system was working fine, applying pattern and statistics based algorithms to identify the incident root cause. More often than not, 80% of the troubleshooting time is spent in solving 20% of the problem which revolves around the root cause.
IT Operations Analytics applies machine learning to automatically sort through the massive volumes of log messages. It quickly and efficiently finds and identifies messages that are truly relevant, applies powerful analysis algorithms that self-learn over time, and leverages the knowledge of experts, enabling it to provide fresh insights to find the root cause of the problem every time. These insights can be applied to accelerate problem resolution and help prevent future issues.
The above can be applied for
- Incident Management
- Problem Management
- Change Management
- Configuration Management
The demands on modern IT infrastructure is such that reacting to an incident is too late. Emerging need is to predict incidents even before they happen. Machine learning techniques help us greatly in Incident prediction.
Machine learning can be applied in real-time from multiple event sources to analyze and detect anomalies before they become systemic and are reported by end users. This can lead to a 75% reduction in MTTD (Mean Time To Detect).
So should you care about what BigData and Data Science can do for you. Our conclusion is that given that ITOA’s maturity level, we strongly suggest you invest in ITOA to have an efficient IT operations running if you have not done so already.