이 페이지 내용

VLAN 누락
협상이 완료되지 않음
MTU 불일치
루프 감지됨
네트워크 포트 플랩
높은 CPU
포트 고착
트래픽 이상 징후
잘못 구성된 포트
오프라인으로 전환

유선 작업

작업 대시보드를 사용하여 스위치에 영향을 주는 문제를 해결할 수 있습니다.

작업 대시보드에서 유선 단추를 클릭하면 사용 가능한 모든 작업 목록이 표시됩니다. 그런 다음 작업을 클릭하여 자세히 조사할 수 있습니다. 사용 가능한 작업은 이 항목의 뒷부분에 설명되어 있습니다.

Switch Button on the Actions Dashboard

메모:

구독에 따라 작업 대시보드에서 볼 수 있는 작업이 결정됩니다. 자세한 내용은 Marvis Actions에 대한 구독 요구 사항을 참조하십시오.

VLAN 누락

Missing VLAN(VLAN 누락) 작업은 VLAN이 AP에 구성되었지만 스위치 포트에는 구성되지 않았음을 나타냅니다. 따라서 클라이언트는 특정 VLAN에서 통신할 수 없으며 DHCP 서버에서 IP 주소를 가져올 수도 없습니다. Marvis는 AP 트래픽의 VLAN과 스위치 포트 트래픽의 VLAN을 비교하여 어떤 디바이스에 VLAN 구성이 누락되었는지 확인합니다.

이 스위치는 Juniper EX 시리즈 또는 QFX 시리즈 스위치이거나 타사 스위치일 수 있습니다.

다음 예에서 Marvis는 누락된 VLAN 구성으로 인해 수신 트래픽을 볼 수 없는 두 개의 AP를 식별합니다. 또한 Marvis는 VLAN 구성이 누락된 특정 스위치를 식별하고 포트 정보를 제공하므로 이 문제를 쉽게 완화할 수 있습니다.

Missing VLAN 작업이 표시되면 AP Insights 페이지의 Client Events(클라이언트 이벤트) 섹션으로 이동하여 Missing VLAN(VLAN 누락) 작업에서 보고된 VLAN의 장애를 확인할 수 있습니다. 해당 VLAN에 연결된 모든 클라이언트에 DHCP 장애가 발생했는지 여부를 확인할 수 있습니다.

메모:

자세한 정보가 필요한 경우 왼쪽 메뉴를 사용하여 스위치 페이지로 이동할 수도 있습니다. 거기에서 스위치를 클릭하면 VLAN을 포함한 각 포트에 대한 정보를 볼 수 있습니다.

Switches Front Panel Information

네트워크에서 문제를 해결한 후 Mist AI는 특정 기간 동안 스위치를 모니터링하여 누락된 VLAN 문제가 실제로 해결되었는지 확인합니다. 따라서 누락된 VLAN 작업이 자동으로 해결되는 데 최대 30분이 걸릴 수 있습니다.

VLAN 누락 작업에 대한 자세한 내용은 다음 비디오를 시청하십시오.

Missing VLANs is a two-decade-old networking problem. It sounds so simple, but in a large enterprise it can become the ghost in the machine, as users complain their calls always drop in a certain area and conventional wisdom is, well, there must be interference or Wi-Fi issues over there. In many cases when Mist support helped troubleshoot, we found a user VLAN was indeed not provisioned on the network switch.

Hence, the user had no place to roam and the call dropped. For customers with tens of thousands of APs, this truly becomes the needle in the haystack problem. At Mist, we wanted to use AI to solve this problem, but first let's take a look at how you might start out today.

You can manually take a look, but I only have two VLANs. Or you can programmably take a look, but this makes my brain hurt. If an AP is connected to a switch port, but the user can't get an IP address or pass any traffic, then the VLAN probably isn't configured on the port or it's black holed.

The traditional way to measure a missing VLAN is to monitor traffic on the VLAN and if one VLAN continuously lacks traffic, then there's a high chance that the VLAN is missing on the switch port. The problem of this approach is false positives. Here you can see during a 24-hour window, we detected more than 33,000 APs missing one or more VLANs because they had little or no traffic, but this was not accurate as we learned that every VLAN is not created equal.

There are at least two types of special purpose VLANs that can cause detection problems. One is the black hole VLAN. Folks can create a black hole VLAN on all unconfigured ports or as a quarantine VLAN for users until they are fully authorized. This VLAN is supposed to be provisioned on the switch in case a quarantined user shows up on the AP. The second example is the over-provisioned VLAN. Larger customers use special VLANs for special sites.

For example, legacy devices might only be present at certain sites, so special VLAN should only be applicable to those sites, but because people do use automation, they want to keep their configurations consistent so they provision that VLAN across all the sites. In this case, you would expect low traffic or no traffic. Those VLANs shouldn't be flagged as missing because they were intentionally over-provisioned.

So the key for reducing false positives is to really identify the purpose of each VLAN. We could ask the customer for their own internal list, perhaps in the form of a spreadsheet, but that's very error prone. MIST developed an unsupervised machine learning model to automatically discover the purpose of each VLAN by learning from the traffic patterns on the VLANs.

In this graph, each dot represents all of the VLANs across the MIST customer base. So for each VLAN, we collect several features. How many APs lack traffic on that VLAN? How many sites lack traffic? How busy is that VLAN minute by minute from all the APs? Then we use another technique called principal component analysis to combine all of these features and map them into this two-dimensional space.

The interesting thing here is the different VLAN types, high traffic, low traffic, black hole, and over-provisioned are separated really well, even across different customers, because it turns out VLAN behavior is very similar across different customers. The beauty of this is instead of developing per customer anomaly detection tools, we actually built one model for everybody. So for any new customers, we don't have to ask them anything.

We can determine the purpose of their VLANs very quickly after they deploy. This is really the power of this multi-tenant infrastructure design. Every customer can benefit from the knowledge learned from our extended customer base.

By precisely identifying each VLAN's purpose, we reduced our initial detection rate from 33,000 plus to specifically 607 VLANs, which we believed were actually missing from the AP switch ports. For MIST, this was the moment of truth. When we were confident in the model, we contacted the customers with these 607 detected missing VLANs, and when we finally heard back, we had an astonishing 100% hit rate, no false positives.

For MIST, this was simply awesome, as there are so many mundane problems we can apply this technique to going forward. So right now, this is shown in Marvis Actions, and with a supported Juniper switch, we can provide the user specific CLI commands that we suggest they add to their config to get these missing VLANs going, with a goal to automatically doing this from the cloud as we gain their trust. And for non-Juniper switches, we give detailed info like which switch, which port, and which VLAN ID to guide them how to solve the problem that they probably didn't even know they had.

This is all built on open protocols like OpenConfig and NetConf. And lessons learned by the MIST data science team, AI solutions should first start by solving real problems, rather than deploying models and hoping for the best. Some AI vendors treat AI as a hammer in search of a nail, and this isn't going to work.

The Marvis AI engine was designed starting with human expertise and then learning over time. At MIST, each support ticket is first run through Marvis to both measure its efficacy and continue to train the model to solve the most important customer issues.

협상이 완료되지 않음

Negotiation Incomplete(협상 완료 안 함) 작업은 자동 협상 실패가 발생하는 스위치 포트의 인스턴스를 탐지합니다. 이 문제는 자동 협상이 올바른 듀플렉스 모드를 설정하지 못해 Marvis가 디바이스 간의 듀플렉스 불일치를 감지할 때 발생할 수 있습니다. Marvis는 영향을 받는 포트에 대한 세부 정보를 제공합니다. 포트 및 연결된 디바이스의 구성을 확인하여 문제를 해결할 수 있습니다.

다음 예제에서는 협상 미완료 작업에 대한 세부 정보를 보여 줍니다. Marvis는 자동 협상이 실패한 스위치와 포트를 나열합니다.

네트워크에서 문제를 해결하면 협상 미완료 작업이 한 시간 이내에 자동으로 해결됩니다.

MTU 불일치

Marvis는 스위치의 포트와 해당 스위치 포트에 직접 연결된 디바이스의 포트 간의 MTU 불일치를 감지합니다. 동일한 레이어 2(L2) 네트워크에 있는 모든 디바이스는 동일한 최대 전송 단위(MTU) 크기를 가져야 합니다. MTU 불일치가 발생하면 디바이스가 패킷을 단편화하여 네트워크 오버헤드를 초래할 수 있습니다.

문제를 해결하려면 스위치 및 연결된 디바이스의 포트 구성을 검토해야 합니다. 다음은 Marvis로 식별된 MTU 불일치의 예입니다. 세부 정보 열에는 불일치가 발생하는 포트가 나열됩니다.

루프 감지됨

Loop Detected(루프 감지) 작업은 스위치가 전송한 것과 동일한 패킷을 수신하는 네트워크 루프를 나타냅니다. 디바이스 간에 여러 링크가 존재하면 루프가 발생합니다. 중복 링크는 L2 루프의 일반적인 원인입니다. 중복 링크는 기본 링크의 백업 링크 역할을 합니다. 두 링크가 동시에 활성화되고 STP(스패닝 트리 프로토콜)와 같은 프로토콜이 제대로 구축되지 않으면 스위칭 루프가 발생합니다.

Marvis는 사이트에서 트래픽 루프가 발생하는 정확한 위치를 식별하고 영향을 받는 스위치를 보여줍니다. 예를 들면 다음과 같습니다.

스위칭 루프는 Switch Insights 페이지의 Switch Events(스위치 이벤트) 아래에 나열됩니다. 다음 예에서는 나열된 STP 토폴로지 변경 사항을 볼 수 있습니다.

네트워크 포트 플랩

네트워크 포트 플랩 작업은 최소 한 시간 동안 지속적으로 바운스되는 트렁크 포트를 식별합니다. 예를 들어, 한 시간 동안 분당 3번의 플랩이 있습니다. 트렁크 포트로 구성된 포트는 개별 트렁크 포트로 또는 포트 채널의 일부로 다른 스위치, 게이트웨이 또는 AP에 연결하는 데 사용됩니다. 포트 플랩핑은 불량 케이블 또는 트랜시버로 인해 단방향 트래픽 또는 LACPDU 교환이 발생하거나 포트에 연결된 최종 디바이스의 지속적인 재부팅으로 인해 발생할 수 있습니다. 다음 예는 Marvis Actions가 네트워크 포트 플랩 작업에 대해 제공하는 세부 정보를 보여줍니다.

Switch Insights(스위치 인사이트) 페이지의 Switch Events(스위치 이벤트)에서 포트 업 및 포트 다운 이벤트를 볼 수 있습니다. Marvis는 플래핑 빈도가 증가하지 않는 한 느린 포트 플랩을 작업으로 나열하지 않습니다. Marvis는 문제의 심각성을 파악하기 위해 느린 포트 플래핑을 계속 모니터링합니다. 플래핑이 과도해지면 Marvis는 빈도와 심각도를 고려한 후 이를 조치로 나열합니다. 대화형 어시스턴트를 사용하여 느린 포트 플랩에 대한 세부 정보를 볼 수 있습니다.

액세스 포트 플랩에 대한 자세한 내용은 액세스 포트 플랩을 참조하십시오.

Marvis Actions 페이지에서 직접 영구 플래핑 포트를 비활성화할 수 있습니다. Network Port Flap actions(네트워크 포트 플랩 작업) 섹션에서 포트를 비활성화할 스위치를 선택하고 DISABLE PORT(포트 비활성화 ) 버튼을 클릭합니다.

비활성화할 수 있는 포트를 나열하는 Disable Port(포트 비활성화) 페이지가 나타납니다. 포트가 이미 비활성화된 경우(이전에 Actions(작업) 페이지를 통해 또는 Switch Details(스위치 세부 정보) 페이지에서 수동으로) 포트를 선택할 수 없습니다.

포트를 비활성화하면 선택한 포트의 포트 구성이 비활성화로 변경되고 포트가 비활성화됩니다. 문제를 해결한 후 스위치 세부 정보 페이지에서 포트 구성을 편집하여 이러한 포트를 다시 사용하도록 설정할 수 있습니다. 포트를 다시 활성화한 후 디바이스를 포트에 다시 연결할 수 있습니다.

네트워크에서 문제를 해결한 후 포트 플랩 작업은 1시간 이내에 자동으로 해결됩니다.

Looking at the switch, in this case, specifically the Juniper switch, we've introduced the action of a port flapping continuously. In this case, we do take into account a simple port down and up, which usually happens when a device connects, and this is currently reflecting a case where the port is continuously flapping, thereby not only causing a poor experience for the device which is connected on the other end, but also having high resource consumption for the switch which can be detrimental to other devices connected on the switch. Here too, we show all the required information in terms of the port, the client which is connected, and the VLAN, if in case it did communicate and we know the VLAN ID.

높은 CPU

Marvis는 지속적으로 높은 CPU 사용률(> 90%)을 보이는 스위치를 감지합니다. 멀티캐스트 트래픽, 네트워크 루프, 하드웨어 문제, 디바이스 온도 등 다양한 요인으로 인해 CPU 사용률이 높아질 수 있습니다. High CPU(높음) 작업에는 스위치, 스위치에서 실행 중인 프로세스, CPU 사용률 및 사용률이 높은 이유가 나열됩니다. 다음 예에서는 fxpc 프로세스의 CPU 사용률이 높으며 사용률이 높은 원인은 스위치에서 인증되지 않은 옵틱을 사용하기 때문입니다.

High CPU(높은 CPU) 작업이 표시되면 스위치에 대한 인사이트(Insights) 페이지로 이동하여 Switch Charts(차트 전환)에서 CPU 사용률 차트를 분석할 수 있습니다. 예를 들면 다음과 같습니다.

포트 고착

Port Stuck 작업은 전송 또는 수신된 패킷이 없는 등 스위치 액세스 포트의 트래픽 패턴 차이를 감지하여 포트에 연결된 클라이언트가 정상적으로 작동하지 않고 있음을 나타냅니다. 다음 예제에서는 Marvis Actions가 포트를 바운스하고 클라이언트가 정상적으로 작동하기 시작하는지 확인할 것을 권장한다는 것을 알 수 있습니다. Marvis는 포트 번호 외에도 포트 및 관련 VLAN에 연결된 클라이언트(이 경우 카메라)를 나열합니다.

Marvis가 포트 고착 문제를 감지하면 자동 포트 바운스를 시작하여 문제를 해결합니다. 자동 포트 바운스로 문제가 해결되지 않으면 Marvis는 이를 작업으로 나열합니다. 다음 예제와 같이 Switch Insights 페이지의 Switch Events(스위치 이벤트)에서 자동 반송 작업을 볼 수 있습니다. 오른쪽 그래프는 포트 바운스 전후의 트래픽을 보여줍니다. 포트 바운스 전에 Tx 트래픽만 표시됩니다(녹색으로 표시). 포트 바운스 후 Rx 트래픽도 표시되는 것을 볼 수 있습니다.

메모:

Port Stuck 작업에 대한 셀프 드라이빙 기능은 기본적으로 활성화되어 있습니다. 셀프드라이빙 기능에 대한 자세한 내용은 셀프드라이빙 Marvis Actions를 참조하십시오.

트래픽 이상 징후

Marvis가 스위치에서 브로드캐스트 및 멀티캐스트 트래픽의 비정상적인 감소 또는 증가를 감지합니다. 또한 비정상적으로 높은 전송 또는 수신 오류를 감지합니다. 연결 실패에 대한 Anomaly Detection(이상 탐지) 보기와 마찬가지로 Details(세부 정보) 보기에는 타임라인, 이상에 대한 설명 및 영향을 받는 포트의 세부 정보가 표시됩니다. 문제가 전체 사이트에 영향을 미치는 경우 Marvis는 영향을 받는 스위치의 세부 정보와 영향을 받는 각 스위치의 포트 세부 정보를 표시합니다.

Marvis, our AI-powered virtual network assistant, employs an actions framework to automatically identify network problems and anomalies that are likely impacting user experience. This helps you to significantly reduce mean time to resolution. Marvis can detect switched traffic anomalies, such as traffic storms or abnormal high TxRx count, with respect to broadcast, unknown, unicast, or multicast traffic.

It uses our third generation of algorithms, including long short-term memory, or LSTM for short, to boost efficacy and eliminate false positives. Visit the link below to learn more.

잘못 구성된 포트

스위치가 다른 스위치에 연결된 경우 통신에는 포트에 대한 공통 속성이 필요합니다. Marvis는 잘못된 구성을 탐지하기 위해 업링크 포트에서 다음과 같은 속성을 비교합니다.

속도
이중
네이티브 VLAN
허용된 VLAN
최대 전송 단위(MTU)
포트 모드(두 포트 모두 "access" 또는 두 포트 모두 "trunk")
STP 모드(두 포트 모두 "포워딩")

Actions(작업) 대시보드에서 Switch > Misconfigured Port(잘못 구성된 포트 )를 클릭하여 화면 아래쪽에서 문제 및 권장 조치를 확인합니다.

View More(자세히 보기) 링크를 클릭하여 MAC 주소 및 포트를 확인합니다.

오프라인으로 전환

Marvis는 Juniper Mist 클라우드에서 연결이 끊어진 스위치를 감지합니다. 스위치는 다음을 비롯한 여러 가지 이유로 오프라인 상태가 될 수 있습니다.

전원 문제
케이블 결함
필수 방화벽 포트가 열려 있지 않습니다.
잘못된 구성

스위치가 오프라인 상태가 되면 Marvis는 스위치를 모니터링하여 오프라인 상태 기간을 확인합니다. 스위치가 3분 이상 오프라인 상태이면 Marvis가 Switch Offline(오프라인 전환) 작업을 생성합니다. Switch Insights(스위치 인사이트) 페이지의 Switch Offline(오프라인 전환) 인프라 경고 및 이벤트는 스위치가 오프라인으로 전환되는 즉시 표시됩니다.

다음은 오프라인으로 전환 Marvis 작업을 보여주는 예입니다. View More(추가 보기 ) 링크를 클릭하여 오프라인 상태인 스위치의 세부 정보를 확인합니다. 스위치 이름을 클릭하면 Switch Events(스위치 이벤트) 아래에 나열된 이벤트를 볼 수 있는 Insights(인사이트) 페이지를 볼 수 있습니다.

오프라인 상태인 스위치의 문제를 해결하려면 스위치 연결 문제 해결을 참조하십시오.