交换矩阵 OAM 的错误处理
交换矩阵操作、管理、维护 (OAM) 有助于检测交换矩阵路径中的故障。每当为 PFE 引入新的交换矩阵路径时,交换矩阵 OAM 都会在交换矩阵平面上发送流量之前验证交换矩阵连接。如果检测到故障,软件将报告故障,并避免将该交换矩阵平面用于该 PFE。此功能的工作原理是通过每个可用交换矩阵平面发送极低的每秒数据包数 (PPS) 自我目的地 OAM 流量,并在端点检测任何流量丢失(交换矩阵自 ping 检查)。
- 在 Junos OS 演化版 20.4R1 中,默认情况下会启用交换矩阵 OAM 功能。您可以使用 CLI 命令
set chassis fabric oam detection-disable禁用该功能。 - 在 Junos OS 演化版 20.4R2 和 21.1R1 中,默认情况下禁用交换矩阵 OAM 功能。
- 在 Junos OS 演化版 22.1R1 中,默认情况下会启用运行时交换矩阵 OAM 功能。您可以使用 CLI 命令
edit chassis fabric oam runtime-disable禁用该功能。PTX10004、PTX10008 和 PTX10016 路由器支持运行时交换矩阵 OAM 功能。
交换矩阵 OAM 检查在启动时完成。失败的路径将被禁用。系统不执行任何恢复操作。但是,您可以尝试通过重新启动 SIB 来恢复受影响的交换矩阵平面。恢复步骤取决于故障的性质。
交换矩阵平面表示 PFE 与交换矩阵 ASIC 之间的独立双向路径。运行时交换矩阵 OAM 会定期检查交换矩阵连接,并帮助在系统运行时检测和报告交换矩阵平面中的故障。运行时交换矩阵 OAM 检测每个 PFE 的交换矩阵可达性。
当单个或多个 FPC 上的相同交换矩阵平面发生故障时,请使用以下命令重新启动包含故障平面的 SIB:
user@host> request chassis sib slot slot-number offline
user@host> request chassis sib slot slot-number online
当多个 FPC 上的随机交换矩阵平面发生故障时,无法将故障隔离到特定的 FPC 或 SIB。但是,您可以尝试通过按顺序重新启动包含受影响平面的 SIB 来恢复平面。
对于交换矩阵 OAM 功能检测到的每个错误,都会生成一个系统日志,帮助运维人员快速高效地访问必要的信息。
要根据 Junos 版本查看相关功能的系统日志消息详细信息,请参阅 Syslog 资源管理器。请参阅 交换矩阵 OAM 系统日志消息, 查看与结构 OAM 的结构链路故障相关的日志、系统日志和其他诊断消息的列表。
以下是错误和系统日志消息的示例:
Oct 29 23:02:46 router-dvi resiliencyd[12921]: Error: /fpc/0/fabspoked-pfe/0/cm/0/pfe/0/fabric_link_foam_fault (0x410009), scope: board, category: internal, severity: major, module: fab-pfe@0, type: fabric link foam fault
以下系统日志消息指示,交换矩阵 OAM 相关错误已清除。
Oct 29 23:25:14 router-dvi resiliencyd[12921]: Performing action clear-cmalarm for error /fpc/0/fabspoked-pfe/0/cm/0/pfe/0/fabric_link_foam_fault (0x410009) in module: fab-pfe@0 with scope: board category: internal level: major
此外,您还可以使用 CLI 命令 show system errors active detail 并 show system alarms 查看与交换矩阵 OAM 相关的错误。
user@router> show system alarms
20 alarms currently active
Alarm time Class Description
2020-08-20 10:32:02 UTC Major FPC 0 Ideeprom read failure
2020-08-20 10:58:07 UTC Major FPC 0 Self_FOAM fault detected
[...Output truncated...]
user@router> show system alarms
14 alarms currently active
Alarm time Class Description
2022-02-15 23:45:28 PST Minor FPC 1 Volt Sensor Fail
2022-02-16 00:02:03 PST Major FPC 1 Self_Fabric OAM Runtime fault detected
2022-02-15 23:43:04 PST Minor FPC 1 Secure boot disabled or not enforced
2022-02-15 23:55:50 PST Minor FPC 3 Secure boot disabled or not enforced
[...Output truncated...]
以下输出显示了单个交换矩阵平面故障(在数据包转发引擎 0 上)和所有交换矩阵平面故障(在数据包转发引擎 1 上)的详细信息。
user@router> show system errors active detail
System Active Errors Detail Information
FPC 0
----------------------------------------------------------------
Error Name : fabric_down_condition_on_pfe
Identifier : /fpc/0/fabricHub/0/cm/0/fabrichub/1/fabric_down_condition_on_pfe
Description : fabric_down_condition_on_pfe
State : enabled
Scope : pfe
Category : functional
Level : major
Threshold : 1
Error limit : 0
Occur count : 3
Clear count : 2
Last occurred(ms ago) : 103158
System Active Errors Detail Information
FPC 0
----------------------------------------------------------------
Error Name : fabric_link_foam_fault
Identifier : /fpc/0/fabspoked-pfe/0/cm/0/pfe/0/fabric_link_foam_fault
Description : fabric link foam fault
State : enabled
Scope : board
Category : internal
Level : major
Threshold : 1
Error limit : 100
Occur count : 2
Clear count : 0
Last occurred(ms ago) : 113277
System Active Errors Detail Information
FPC 0
----------------------------------------------------------------
Error Name : fabric_link_foam_fault
Identifier : /fpc/0/fabspoked-pfe/0/cm/0/pfe/1/fabric_link_foam_fault
Description : fabric link foam fault
State : enabled
Scope : board
Category : internal
Level : major
Threshold : 1
Error limit : 100
Occur count : 12
Clear count : 0
Last occurred(ms ago) : 103267
System Active Errors Detail Information
RE 0
----------------------------------------------------------------
Error Name : fpga_min_supported_fw_ver_mismatch
Identifier : /re/0/hwdre/0/cm/0/fpga_fw_events/UBAM FPGA/fpga_min_supported_fw_ver_mismatch
Description : firmware_version_lower_than_minimum_expected
State : enabled
Scope : board
Category : functional
Level : minor
Threshold : 10
Error limit : 1
Occur count : 1
Clear count : 0
Last occurred(ms ago) : 68886367
FPC 1
----------------------------------------------------------------
Error Name : fabric_link_self_fabric_oam_runtime_fault
Identifier : /fpc/1/fabspoked-pfe/0/cm/0/pfe/0/fabric_link_self_fabric_oam_runtime_fault
Description : fabric link self fabric oam runtime fault
State : enabled
Scope : board
Category : internal
Level : major
Threshold : 1
Error limit : 36
Occur count : 1
Clear count : 0
Last occurred(ms ago) : 2022-02-16 00:02:03 PST (448108 ms ago) System Active Errors Detail Information
您可以使用 CLI 命令 show chassis fabric fpcs 查看每个交换矩阵平面的交换矩阵 OAM 自 ping 状态。
user@router> show chassis fabric fpcs
Fabric management FPC state:
FPC #0
PFE #0
SIB0_Asic0_Fcore0 (plane 0) Plane Disabled, Links ok Fabric OAM failed
SIB0_Asic0_Fcore0 (plane 1) Plane Enabled, Links ok Fabric OAM success
SIB0_Asic0_Fcore0 (plane 2) Plane Enabled, Links ok Fabric OAM success
SIB0_Asic0_Fcore0 (plane 3) Plane Enabled, Links ok Fabric OAM success
SIB0_Asic0_Fcore0 (plane 4) Plane Enabled, Links ok Fabric OAM success
SIB0_Asic0_Fcore0 (plane 5) Plane Enabled, Links ok Fabric OAM success
SIB1_Asic0_Fcore0 (plane 6) Plane Enabled, Links ok Fabric OAM success
SIB1_Asic0_Fcore0 (plane 7) Plane Enabled, Links ok Fabric OAM success
SIB1_Asic0_Fcore0 (plane 8) Plane Enabled, Links ok Fabric OAM success
SIB1_Asic0_Fcore0 (plane 9) Plane Enabled, Links ok Fabric OAM success
SIB1_Asic0_Fcore0 (plane 10) Plane Enabled, Links ok Fabric OAM success
SIB1_Asic0_Fcore0 (plane 11) Plane Enabled, Links ok Fabric OAM success
PFE #1
SIB0_Asic0_Fcore0 (plane 0) Plane Enabled, Links ok Fabric OAM success
SIB0_Asic0_Fcore0 (plane 1) Plane Enabled, Links ok Fabric OAM success
user@router> show chassis fabric fpcs
Fabric management FPC state:
FPC #1
PFE #0
SIB0_Asic0_Fcore0 (plane 0) Plane Enabled, Links ok Fabric OAM Runtime success
SIB0_Asic0_Fcore0 (plane 1) Plane Disabled, Links ok Fabric OAM Runtime failed
SIB0_Asic1_Fcore0 (plane 2) Plane Enabled, Links ok Fabric OAM Runtime success
SIB0_Asic1_Fcore0 (plane 3) Plane Enabled, Links ok Fabric OAM Runtime success
SIB0_Asic2_Fcore0 (plane 4) Plane Enabled, Links ok Fabric OAM Runtime success
SIB0_Asic2_Fcore0 (plane 5) Plane Enabled, Links ok Fabric OAM Runtime success
SIB1_Asic0_Fcore0 (plane 6) Plane Enabled, Links ok Fabric OAM Runtime success
SIB1_Asic0_Fcore0 (plane 7) Plane Enabled, Links ok Fabric OAM Runtime success
SIB1_Asic1_Fcore0 (plane 8) Plane Enabled, Links ok Fabric OAM Runtime success
SIB1_Asic1_Fcore0 (plane 9) Plane Enabled, Links ok Fabric OAM Runtime success
SIB1_Asic2_Fcore0 (plane 10) Plane Enabled, Links ok Fabric OAM Runtime success
SIB1_Asic2_Fcore0 (plane 11) Plane Enabled, Links ok Fabric OAM Runtime success
SIB2_Asic0_Fcore0 (plane 12) Plane Enabled, Links ok Fabric OAM Runtime success
SIB2_Asic0_Fcore0 (plane 13) Plane Enabled, Links ok Fabric OAM Runtime success
SIB2_Asic1_Fcore0 (plane 14) Plane Enabled, Links ok Fabric OAM Runtime success
SIB2_Asic1_Fcore0 (plane 15) Plane Enabled, Links ok Fabric OAM Runtime success
禁用交换矩阵 OAM 功能时,命令 show chassis fabric fpcs 将显示以下输出:
user@router> show chassis fabric fpcs
Fabric management FPC state:
FPC #0
PFE #0
SIB0_Asic0_Fcore0 (plane 0) Plane Enabled, Links ok
SIB0_Asic0_Fcore0 (plane 1) Plane Enabled, Links ok
SIB0_Asic0_Fcore0 (plane 2) Plane Enabled, Links ok
SIB0_Asic0_Fcore0 (plane 3) Plane Enabled, Links ok
SIB0_Asic0_Fcore0 (plane 4) Plane Enabled, Links ok
SIB0_Asic0_Fcore0 (plane 5) Plane Enabled, Links ok
SIB1_Asic0_Fcore0 (plane 6) Plane Enabled, Links ok
SIB1_Asic0_Fcore0 (plane 7) Plane Enabled, Links ok
SIB1_Asic0_Fcore0 (plane 8) Plane Enabled, Links ok
SIB1_Asic0_Fcore0 (plane 9) Plane Enabled, Links ok
SIB1_Asic0_Fcore0 (plane 10) Plane Enabled, Links ok
SIB1_Asic0_Fcore0 (plane 11) Plane Enabled, Links ok
PFE #1
SIB0_Asic0_Fcore0 (plane 0) Plane Enabled, Links ok
SIB0_Asic0_Fcore0 (plane 1) Plane Enabled, Links ok
SIB0_Asic0_Fcore0 (plane 2) Plane Enabled, Links ok
SIB0_Asic0_Fcore0 (plane 3) Plane Enabled, Links ok
SIB0_Asic0_Fcore0 (plane 4) Plane Enabled, Links ok
SIB0_Asic0_Fcore0 (plane 5) Plane Enabled, Links ok
SIB1_Asic0_Fcore0 (plane 6) Plane Enabled, Links ok
SIB1_Asic0_Fcore0 (plane 7) Plane Enabled, Links ok
SIB1_Asic0_Fcore0 (plane 8) Plane Enabled, Links ok
SIB1_Asic0_Fcore0 (plane 9) Plane Enabled, Links ok
SIB1_Asic0_Fcore0 (plane 10) Plane Enabled, Links ok
SIB1_Asic0_Fcore0 (plane 11) Plane Enabled, Links ok
PFE #2
SIB0_Asic0_Fcore0 (plane 0) Plane Enabled, Links ok
SIB0_Asic0_Fcore0 (plane 1) Plane Enabled, Links ok
SIB0_Asic0_Fcore0 (plane 2) Plane Enabled, Links ok
SIB0_Asic0_Fcore0 (plane 3) Plane Enabled, Links ok
SIB0_Asic0_Fcore0 (plane 4) Plane Enabled, Links ok
SIB0_Asic0_Fcore0 (plane 5) Plane Enabled, Links ok
SIB1_Asic0_Fcore0 (plane 6) Plane Enabled, Links ok
SIB1_Asic0_Fcore0 (plane 7) Plane Enabled, Links ok
SIB1_Asic0_Fcore0 (plane 8) Plane Enabled, Links ok
SIB1_Asic0_Fcore0 (plane 9) Plane Enabled, Links ok
SIB1_Asic0_Fcore0 (plane 10) Plane Enabled, Links ok
SIB1_Asic0_Fcore0 (plane 11) Plane Enabled, Links ok
PFE #3