AIOps in Action
SUMMARY Gain a deeper understanding of AI-native operations by exploring examples of proactive and reactive troubleshooting with the Marvis Action dashboard, Service Level Expectations, and the Marvis conversational assistant.
Let’s see how Oscar, an operations lead, uses the Juniper Mist portal to anticipate and respond to issues during a typical day.
As you read about Oscar’s experiences, you’ll get a high-level introduction to many features in the Juniper Mist portal. You’ll get more in-depth information later in this guide.
Starting the Day with the Marvis Actions Dashboard
Oscar always starts his day by looking at the Marvis Actions dashboard. On this dashboard, Marvis identifies actions that can improve the user experience. By following through on these recommendations, Oscar can address issues before users report an impact.
To find the Actions dashboard, select Marvis > Marvis Actions from the left menu of the Juniper Mist portal.
Today Oscar notices eight issues with APs. With one click, he sees a high-level root cause analysis: five are offline, one failed its health check, and one has a coverage hole.
He clicks the Coverage Hole item. At the bottom of the page, Marvis shows him where and when the issue occurred. Marvis also provides a recommendation to resolve the issue.
Oscar clicks to view more information. For this type of issue, Marvis displays the floorplan. Oscar sees exactly where this AP is located. With this information, he understands the issue and the impact and can follow through to ensure adequate coverage.
Marvis Actions Video Demo
In this video demo, Marvis recommends actions for bad signal strength.
So, what else can Marvis do for us? Meet Marvis Actions, the proactive side of Marvis. Marvis identifies actions that users can take to improve their user experience. If there is action that can be taken to improve the network, it will be brought to the forefront here.
For our WAN, we see that Marvis has identified a persisting LTE signal quality issue. From here we can drill into the details of the issue and get a better sense of the impacts. Looks like I should take some action and have the antenna adjusted. This is a great example of Marvis helpfully suggesting actions we can take to make the user experience better.
Troubleshooting Low Service Levels
Next, Oscar turns to the Service Level dashboards. These dashboards show successes and failures for critical factors (SLEs) that can impact user experiences.
On the Wireless dashboard, color coding draws Oscar’s attention to a low SLE for coverage. On the left side, he sees the overall success rate for each service level. Coverage has only a 67 percent success rate. On the right side of the page, Oscar sees a high-level root cause analysis (on the right). Of the unsuccessful user experiences, 90 percent are due to weak signal.
Oscar clicks to take a closer look. On the Root Cause Analysis page, he clicks Weak Signal to view more information. He can see that 77 percent of users and 88 percent of APs are having signal issues.
By using the tabs in the lower half of the screen, Oscar can get a complete view of the scope of impact:
-
Timeline—When did the issues occur?
-
Distribution—Where in the network did the issues occur?
-
Affected Items—Which users, devices, and applications were involved?
-
Location—Where are the floorplan did the issues occur?
SLEs Video Demo
This video demo shows how to troubleshoot low SLEs for WAN issues.
Looking at our recently deployed Cupertino site, we can see that it is not meeting Service Levels. Clicking into the site, we get a closer look at the SLEs. They are broken down into three important health categories that play a role in user experience: the WAN Edge device health, the health of WAN links and paths, and the health of applications themselves. Each SLE is broken down into a simple unit of measure for the user experience called a User Minute.
Simply put, this is telling us what our user experiences on the WAN are per user, per minute. Behind these seemingly simple measurements are the complex and powerful AI models of the Mist Cloud, fed by rich telemetry from the Session Smart Network. For each SLE, we get a breakdown of the root cause of the issues identified. Whenever user experience is poor on the WAN, Mist not only tells us the root cause, but also tells us what was affected, such as the impacted applications, users, links, paths and devices.
Getting Help from the Marvis Conversational Assistant
After lunch, Oscar bumps into a colleague, Roberta, who mentions that she had a bad Microsoft Teams call that morning. Oscar decides to get help from Marvis by using the chat feature. He clicks the Marvis icon at the bottom left corner of the screen.
In the pop-up window, he enters: troubleshoot teams.
As Marvis asks guiding questions and Oscar replies, Marvis provides a list of recent Teams calls.
Oscar clicks a call to view additional information.
Marvis shows that there was an issue on the Wired network. From here, Oscar can click to view additional information.
Marvis Conversational Assistant Video Demo
In this video demo, Marvis helps to troubleshoot an issue with Microsoft Teams.
Marvis is also ever present in the forefront of the Mist experience. You can ask Marvis questions about the network at any time. You can ask it to help you do things like troubleshoot a device or access documentation. At our our Cupertino site, we know Teams is an important collaboration application.
A particular user at the site has noticed periodic issues with poor Teams calls. Let's ask Marvis to help us out. Marvis quickly responds with a handful of Teams sessions that it determined were calls from our user yesterday. Great.
Let's ask Marvis to troubleshoot one of them. Marvis returns the end-to-end path of the session from client-to-cloud app server. We can see that Marvis points out the WAN as a source of issues that impacted the experience. Going one step further, it shows us the WAN Edges that the session traversed, and it pinpoints high network jitter between the edge devices that impacted the experience.
Think about that for a moment. A simple question. Why was my Teams call bad? A question that would historically need to be answered by top technical operators across different disciplines of expertise.
Going device to device, pouring through logs and packet captures, mountains of monitoring information just to answer where the session went and where it went wrong. A simple question, simply answered by asking Marvis.
You also can enter structured queries by using the Marvis Query Language. For more information, see Marvis Query Language.