Troubleshooting QFX3100 Director Device Isolation
Description: Both connections between the QFX3100 Director devices are broken so that one of the Director devices in a Director group becomes isolated from the group.
The redundant patch cables interconnecting the Director devices are critical links required for the operation of the Director group. The two inter-Director device links must remain connected when the Director devices are online. After the Director devices are installed and the Director group is active, if a single inter-Director device link loses and regains its connection, the operation of the Director group remains intact. However, the loss of both inter-Director device links causes one Director device to isolate itself from the Director group.
Do not reconnect the inter-Director patch cables before properly restarting the isolated Director device. Restarting the active Director device instead of the isolated Director device can result in both Director devices rebooting, with a subsequent data loss.
Environment: This problem occurs between the two QFX3100 Director devices found in QFabric systems.
Symptoms: Symptoms of this problem include an unscheduled rebooting of one of the Director devices.
Determine Which Director Device Is Isolated
Before restoring the inter-Director device links, determine which one of the Director devices is in isolation.
To locate an isolated Director device, use one of the following methods:
Review logs or management tools for standard SNMP traps issued from the Director group before the Director device became isolated.
If eth-2/6 links are down, the Director group cannot communicate. Normally, one of the devices reboots.
If both eth-2/6 and eth-7/8/9 links are down, the Director device is isolated from the control plane and is not providing fabric services.
Issue show fabric session-host.
Use the CLI to determine the serial numbers of the active Director device.
Issue the show fabric session-host command.
root@qfabric>show fabric session-host Identifier: 0281042010000013
Issue the show fabric administration inventory director-group status | grep “dg0|dg1” command.
root@qfabrid> show fabric administration inventory director-group status | grep “dg0|dg1”
dg0 online master 10.94.214.80 0% 13597976k 4 4 days, 22:36 hrs dg1 online master 10.94.214.81 0% 18677380k 3 4 days, 22:25 hrs dg0 0281042010000013 online master dg1 0281042010000018 online backup
When the Director devices cannot communicate, the show fabric administration inventory director-group command only displays the Director device that is online.
Power Off the Isolated Director Device and Restore the Inter-Director Device Links
Be sure you know which Director device is active and which is isolated. If you power off the active Director device, both Director devices reboot and cause potential data loss on the system.
To restore communication within the Director group:
- Power off the isolated Director device.
- Restore the inter-Director device links (port
3) by firmly inserting the redundant patch cables.
- Power on the previously isolated Director device. The Director device reboots.