Troubleshooting Bare Metal Servers
This topic provides the steps to troubleshoot BMS.
Follow these steps to troubleshoot some of the common issues:
Verify that the following objects are created:
When the BMS is in provisioning state (when BMS is booting for the first time), there should be two neutron ports—one on provisioning network and another on the tenant network. Run the openstack port list/show command to view the list of ports.
The port connected to the provisioning network should have local_link_information displaying the name of the QFX or TOR and the port to which the bare metal server connected.
After network flip, only one port should be present. The port connected to provisioning network should be deleted.
Verify that the logical Interface(s) are created. Run the curl http://localhost:8082/logical-interfaces command to view the logical interfaces. The logical interface should point to the correct physical interface.
Follow these steps to troubleshoot LAG interfaces (AE interfaces):
Ensure that an aggregated Ethernet physical interface is created. Run the curl http://localhost:8082/physical-interfaces command to verify. The AE interface name starts with ae.
Ensure that logical Interface is created. Run the curl http://localhost:8082/logical-interfaces command.
The logical interface should have parent reference pointing to the ae physical interface.
Ensure that a link aggregation group (LAG) is created. Run the curl http://localhost:8082/link-aggregation-group command to verify.
Follow these steps to troubleshoot multihomed interfaces:
Ensure that two logical Interfaces are created. Run the curl http://localhost:8082/logical-interfaces command to verify.
Each logical interface should have a parent reference pointing to the physical interface. The Ethernet segment identifier (ESI) should be set to the same value for both physical Interfaces.
Follow these steps if you get the error message No Valid Host Found when you launch a BMS server.
Run the openstack baremetal node list/show command to verify that the nodes are registered on Ironic and are not in error state.
Run the openstack baremetal port list/show command to verify that ports for the nodes are registered.
Run the openstack baremetal portgroup list/show command to verify that the port groups (in case of LAG/MH deployments).
Run the openstack flavor list/show command to verify the BMS flavors details to ensure that the flavor matches with the node specification.
Review the api-server logs for errors. The log contains errors of there is a duplicate MAC address or the physical interface is not configured.
Review the ironic-conductor logs for errors. For example, PXE_ENABLED port is not found.
Follow these steps if the server does not boot or if the server remains in boot state:
Verify whether the server is assigned an IP address on the provisioning network.
If an IP address is not assigned, verify whether the TSN node is reachable.
If an IP address is assigned, check whether the TFTP boot server is reachable.
In either case, you can use the tcpdump tool to review the TCP packets to check whether the bare metal server can reach these servers.
Follow these steps if the server was assigned an IP address and is booted on provisioning network, but remains the same state. That is, network flip does not happen.
Verify the ironic-conductor logs to see whether Ironic Python Agent (IPA) on the bare metal server is able to communicate with Ironic Conductor.
Check whether the image was built correctly with the correct IPA.