本页内容

缺少 VLAN
协商未完成
MTU 不匹配
检测到环路
网络端口抖动
CPU 使用率高
端口卡滞
流量异常
端口配置错误
开关离线
检测到非法 DHCP 服务器

有线操作

使用操作仪表板解决影响交换机的问题。

当您单击“操作”仪表板上的“有线”按钮时，您将看到所有可用操作的列表。然后，可以单击操作以进一步调查。本主题稍后将介绍可用操作。

Switch Button on the Actions Dashboard

注意：

订阅确定可以在“操作”仪表板上看到的操作。有关更多信息，请参阅 Marvis 操作的订阅要求。

缺少 VLAN

缺少 VLAN 操作表示在接入点上配置了 VLAN，但未在交换机端口上配置 VLAN。因此，客户端无法在特定 VLAN 上进行通信，也无法从 DHCP 服务器获取 IP 地址。Marvis 会将接入点流量和客户端 VLAN 流量上的 VLAN 与交换机端口流量上的 VLAN 进行比较，并确定哪台设备缺少 VLAN 配置。

交换机可以是瞻博网络 EX 系列或 QFX 系列交换机，也可以是第三方交换机。

在以下示例中，Marvis 识别了一个由于缺少 VLAN 配置而看不到任何传入流量的接入点。Marvis 还能识别缺少 VLAN 配置的特定交换机，并提供端口信息，使您能够轻松缓解此问题。

看到缺少 VLAN 操作时，可以转至 Client Insights 页面上的客户端事件部分，并检查是否存在与报告的 VLAN 相关的故障。您可以验证是否在该 VLAN 上连接的所有客户端都遇到 DHCP 故障。

如果启用了 Marvis Minis，则当 Marvis 发现网络中存在潜在的 VLAN 丢失问题时，验证就会自动启动。Marvis Minis 会验证可疑 VLAN 路径上的连接，以确认 VLAN 是否丢失，并查明可能出现问题的位置。查看更多链接提供 Marvis Minis 验证的详细信息，如以下示例所示。

当您单击 “询问 Marvis” 选项时，它会自动调出 Marvis 对话界面并提供问题的详细分析。其中包括问题摘要、可能的根本原因、当前状态和 Minis 验证结果。您还可以提出后续问题以更深入地探讨特定方面。

注意：

如果您需要更多信息，也可以使用左侧菜单转到“交换机”页面。在这里，单击交换机可查看每个端口的信息，包括 VLAN。

修复网络中的问题后，Mist AI 会对交换机进行一段时间监控，并确保丢失 VLAN 的问题确实得到解决。

有关缺少 VLAN 操作的更多信息，请观看以下视频：

Missing VLANs is a two-decade-old networking problem. It sounds so simple, but in a large enterprise it can become the ghost in the machine, as users complain their calls always drop in a certain area and conventional wisdom is, well, there must be interference or Wi-Fi issues over there. In many cases when Mist support helped troubleshoot, we found a user VLAN was indeed not provisioned on the network switch.

Hence, the user had no place to roam and the call dropped. For customers with tens of thousands of APs, this truly becomes the needle in the haystack problem. At Mist, we wanted to use AI to solve this problem, but first let's take a look at how you might start out today.

You can manually take a look, but I only have two VLANs. Or you can programmably take a look, but this makes my brain hurt. If an AP is connected to a switch port, but the user can't get an IP address or pass any traffic, then the VLAN probably isn't configured on the port or it's black holed.

The traditional way to measure a missing VLAN is to monitor traffic on the VLAN and if one VLAN continuously lacks traffic, then there's a high chance that the VLAN is missing on the switch port. The problem of this approach is false positives. Here you can see during a 24-hour window, we detected more than 33,000 APs missing one or more VLANs because they had little or no traffic, but this was not accurate as we learned that every VLAN is not created equal.

There are at least two types of special purpose VLANs that can cause detection problems. One is the black hole VLAN. Folks can create a black hole VLAN on all unconfigured ports or as a quarantine VLAN for users until they are fully authorized. This VLAN is supposed to be provisioned on the switch in case a quarantined user shows up on the AP. The second example is the over-provisioned VLAN. Larger customers use special VLANs for special sites.

For example, legacy devices might only be present at certain sites, so special VLAN should only be applicable to those sites, but because people do use automation, they want to keep their configurations consistent so they provision that VLAN across all the sites. In this case, you would expect low traffic or no traffic. Those VLANs shouldn't be flagged as missing because they were intentionally over-provisioned.

So the key for reducing false positives is to really identify the purpose of each VLAN. We could ask the customer for their own internal list, perhaps in the form of a spreadsheet, but that's very error prone. MIST developed an unsupervised machine learning model to automatically discover the purpose of each VLAN by learning from the traffic patterns on the VLANs.

In this graph, each dot represents all of the VLANs across the MIST customer base. So for each VLAN, we collect several features. How many APs lack traffic on that VLAN? How many sites lack traffic? How busy is that VLAN minute by minute from all the APs? Then we use another technique called principal component analysis to combine all of these features and map them into this two-dimensional space.

The interesting thing here is the different VLAN types, high traffic, low traffic, black hole, and over-provisioned are separated really well, even across different customers, because it turns out VLAN behavior is very similar across different customers. The beauty of this is instead of developing per customer anomaly detection tools, we actually built one model for everybody. So for any new customers, we don't have to ask them anything.

We can determine the purpose of their VLANs very quickly after they deploy. This is really the power of this multi-tenant infrastructure design. Every customer can benefit from the knowledge learned from our extended customer base.

By precisely identifying each VLAN's purpose, we reduced our initial detection rate from 33,000 plus to specifically 607 VLANs, which we believed were actually missing from the AP switch ports. For MIST, this was the moment of truth. When we were confident in the model, we contacted the customers with these 607 detected missing VLANs, and when we finally heard back, we had an astonishing 100% hit rate, no false positives.

For MIST, this was simply awesome, as there are so many mundane problems we can apply this technique to going forward. So right now, this is shown in Marvis Actions, and with a supported Juniper switch, we can provide the user specific CLI commands that we suggest they add to their config to get these missing VLANs going, with a goal to automatically doing this from the cloud as we gain their trust. And for non-Juniper switches, we give detailed info like which switch, which port, and which VLAN ID to guide them how to solve the problem that they probably didn't even know they had.

This is all built on open protocols like OpenConfig and NetConf. And lessons learned by the MIST data science team, AI solutions should first start by solving real problems, rather than deploying models and hoping for the best. Some AI vendors treat AI as a hammer in search of a nail, and this isn't going to work.

The Marvis AI engine was designed starting with human expertise and then learning over time. At MIST, each support ticket is first run through Marvis to both measure its efficacy and continue to train the model to solve the most important customer issues.

协商未完成

协商未完成操作可检测交换机端口上发生自动协商失败的实例。当 Marvis 由于自动协商未能设置正确的双工模式而检测到设备之间的双工不匹配时，可能会出现此问题。Marvis 可提供有关受影响端口的详细信息。您可以检查端口和连接设备上的配置以解决问题。

以下示例显示了协商未完成操作的详细信息。请注意，Marvis 会列出自动协商失败的交换机和端口。

修复网络中的问题后，协商未完成操作会在一小时内自动解决。

MTU 不匹配

Marvis 可检测交换机上的端口与设备上直接连接到该交换机端口的端口之间的 MTU 不匹配。同一第 2 层（L2）网络上的所有设备必须具有相同的 MTU 大小。当发生 MTU 不匹配时，设备可能会对数据包进行分段，从而产生网络开销。

您需要检查交换机和连接设备上的端口配置才能解决问题。以下是 Marvis 识别的 MTU 不匹配示例。详细信息（ Details ）列列出发生不匹配的端口。

检测到环路

检测到环路操作表示网络中存在环路，导致交换机收到与其发送的相同数据包。当设备之间存在多个链路时，就会发生环路。冗余链路是造成 L2 环路的常见原因。冗余链路用作主链路的备用链路。如果两个链路同时处于活动状态，且生成树协议（STP）等协议未正确部署，则会发生交换环路。

Marvis 可准确识别站点中发生流量环路的位置，并向您显示受影响的交换机。这是一个例子：

交换环路列在 Switch Insights 页面的 Switch Events 下。在以下示例中，您可以看到列出的 STP 拓扑更改。

网络端口抖动

网络端口抖动操作可识别持续反弹至少一小时的中继端口。例如，每分钟翻动三次，持续一小时。配置为中继端口的端口可用于作为单个中继端口或作为端口通道的一部分连接到其他交换机、网关或接入点。发生端口抖动的原因可能是由于电缆或收发器损坏导致单向流量或 LACPDU 交换，或连接到端口的终端设备不断重新启动。以下示例显示了 Marvis 操作为网络端口抖动操作提供的详细信息：

您可以在 Switch Insights 页面的 Switch Events 下查看端口启动和端口关闭事件。Marvis 不会将缓慢的端口抖动列为操作，除非抖动频率增加。Marvis 会继续监控端口抖动缓慢的情况，以确定问题的严重性。如果抖动变得过度，Marvis 会在考虑频率和严重程度后将其列为操作。您可以使用对话助手查看有关端口抖动缓慢的详细信息。

有关接入端口抖动的详细信息，请参阅接入端口抖动，

您可以直接从 Marvis 操作页面禁用持续抖动的端口。在“网络端口抖动操作”部分中，选择要禁用端口的交换机，然后单击 “禁用端口 ”按钮。

此时将显示禁用端口页面，其中列出了可以禁用的端口。如果端口已禁用（之前通过 Actions 页面或从 Switch Details 页面手动），则无法选择端口。

禁用端口时，所选端口上的端口配置将更改为禁用，并且端口将关闭。解决问题后，您可以通过在交换机详细信息页面上编辑端口配置来重新启用这些端口。重新启用端口后，您可以将设备重新连接到端口。

修复网络中的问题后，端口抖动操作会在一小时内自动解决。

Looking at the switch, in this case, specifically the Juniper switch, we've introduced the action of a port flapping continuously. In this case, we do take into account a simple port down and up, which usually happens when a device connects, and this is currently reflecting a case where the port is continuously flapping, thereby not only causing a poor experience for the device which is connected on the other end, but also having high resource consumption for the switch which can be detrimental to other devices connected on the switch. Here too, we show all the required information in terms of the port, the client which is connected, and the VLAN, if in case it did communicate and we know the VLAN ID.

CPU 使用率高

Marvis可检测 CPU 利用率持续偏高（> 90%）的交换机。各种因素都可能导致 CPU 使用率过高：组播流量、网络环路、硬件问题、设备温度等。高 CPU 操作会列出交换机、交换机上运行的进程以及 CPU 利用率以及高利用率的原因。在以下示例中，您会看到 fxpc 进程的 CPU 利用率很高，而利用率高的原因是交换机上使用了未经认证的光学器件：

如果看到“高 CPU”操作，可以转至交换机的“洞察”页面，然后分析“交换机图表”下的“CPU 利用率”图表。这是一个例子：

端口卡滞

端口卡滞操作可检测交换机接入端口上的流量模式差异，例如未传输或接收数据包，表明连接到端口的客户端运行不正常。在以下示例中，您会看到 Marvis 操作建议您退回端口并验证客户端是否开始正常运行。请注意，除了端口号之外，Marvis 还会列出连接到端口的客户端（在本例中为摄像头）和关联的 VLAN。

当 Marvis 检测到端口卡死问题时，它会启动自动端口退回来解决问题。如果自动端口退回无法解决问题，Marvis 会将其列为操作。您可以在 Switch Insights 页面的 Switch Events 下查看自动跳出操作，如以下示例所示。右图显示了端口退回前后的流量。您会看到，在端口退回之前，只看到 Tx 流量（以绿色表示）。端口并轨后，还可以看到 Rx 流量。

注意：

默认情况下，端口卡滞操作的自我驱动功能处于启用状态。有关自我驱动型功能的信息，请参阅自我驱动型 Marvis 操作。

流量异常

Marvis 检测到交换机上的广播和组播流量异常下降或增加。它还可以检测任何异常高的传输或接收错误。与连接故障的“异常检测”视图一样，“详细信息”视图显示时间线、异常说明以及受影响端口的详细信息。如果问题影响整个站点，Marvis 会显示受影响交换机的详细信息以及每个受影响交换机的端口详细信息。

Marvis, our AI-powered virtual network assistant, employs an actions framework to automatically identify network problems and anomalies that are likely impacting user experience. This helps you to significantly reduce mean time to resolution. Marvis can detect switched traffic anomalies, such as traffic storms or abnormal high TxRx count, with respect to broadcast, unknown, unicast, or multicast traffic.

It uses our third generation of algorithms, including long short-term memory, or LSTM for short, to boost efficacy and eliminate false positives. Visit the link below to learn more.

端口配置错误

当一台交换机连接到另一台交换机时，通信需要端口上的通用属性。为了检测配置错误，Marvis 会比较上行链路端口上的以下属性：

速度
双工
本机 VLAN
允许的 VLAN
MTU
端口模式（两个端口“接入”或两个端口“中继”）
STP 模式（两个端口均为“转发”）

在 Actions 仪表板上，单击 Switch>Wrongconfigured Port 以在屏幕下部查看问题和建议的操作。

单击“ 查看更多 ”链接以查看 MAC 地址和端口。

开关离线

Marvis 可检测与瞻博网络 Mist 云断开连接的交换机。交换机可能会离线，原因有很多，包括：

电源问题
电缆故障
未打开所需的防火墙端口
配置不正确

当交换机离线时，Marvis 会监控该交换机以检查离线状态的持续时间。如果交换机离线超过三分钟，Marvis 会生成“开关离线”操作。请注意，一旦交换机脱机，“交换机洞察”页面上的“交换机脱机”基础架构警报和事件就会立即显示。

下面是一个显示“Marvis 离线切换”操作的示例。单击查看更多链接，查看脱机交换机的详细信息。如果单击交换机名称，则可以查看 Insights 页面，在该页面中，您可以查看 Switch Events 下列出的事件。

要对脱机交换机进行故障排除，请参阅对交换机连接进行故障排除。

检测到非法 DHCP 服务器

当 Marvis 在启用了 DHCP 侦听的 EX 系列交换机上识别出未经授权的 DHCP 服务器时，将触发“检测到恶意 DHCP 服务器”操作。及早检测到非法 DHCP 服务器至关重要，因为它可能会因以下原因扰乱正常的网络操作：

从错误的子网分配 IP 地址，导致设备最终访问不正确或无法路由的网络
导致随机或间歇性连接问题，例如无法访问内部资源或互联网

Marvis 会在满足以下条件时生成此操作：

观察来自未知服务器的 DHCP 报价
将事件映射到特定交换机、端口、VLAN 或站点
观察到非法 DHCP 活动反复发生的情况，确认这并非一次性异常，而是需要解决的持续问题

检测到“非法 DHCP 服务器”Marvis 操作是一种自我驱动型操作。默认情况下，操作的自我驱动功能处于禁用状态。

如果您为此操作启用了自我驱动功能，Marvis 会自动禁用恶意 DHCP 服务器使用的交换机端口。 请注意，这仅适用于接入端口。

如果禁用了自我驱动，则可以通过单击禁用端口按钮来手动禁用端口。

有关自我驱动型功能以及如何启用该功能的信息，请参阅自我驱动型 Marvis 操作。

单击查看更多链接将显示详细信息，例如恶意服务器的 MAC 地址、事件数以及显示恶意服务器事件时间线的图表。还会显示 Switch Insights 页面中的事件。

当您单击 “询问 Marvis” 选项时，它会自动调出 Marvis 对话界面并提供问题的详细分析。这包括问题摘要、可能的根本原因、建议、恶意服务器事件的时间线以及来自“交换机洞察”页面的相关事件。您还可以提出后续问题以更深入地探讨特定方面。