Determining Why Mastership Switched
Purpose
To determine why a mastership has switched.
Action
Mastership can switch between the master Routing Engine and the backup Routing Engine for the following reasons:
- Hardware problems.
- The master Routing Engine is pulled.
- Software issues, such as a Routing Engine kernel crash.
View the log file /var/log/mastership for redundancy logging. This file contains hardware and software transitions to help debug auto-redundancy issues.
Table 1 lists the event codes that can be displayed in the mastership log.
Table 1: Logging Events
Event Code | Description |
---|---|
E_NULL = 0 | The event is a null event. |
E_CFG_M | The Routing Engine is configured as master. |
E_CFG_B | The Routing Engine is configured as backup. |
E_CFG_D | The Routing Engine is configured as disabled. |
E_MAXTRY | The maximum number of tries to acquire or release mastership was exceeded. |
E_REQ_C | A claim mastership request was sent. |
E_ACK_C | A claim mastership acknowledgement was received. |
E_NAK_C | A claim mastership request was not acknowledged. |
E_REQ_Y | Confirmation of mastership is requested. |
E_ACK_Y | Mastership is acknowledged. |
E_NAK_Y | Mastership is not acknowledged. |
E_REQ_G | A giveup mastership request was sent by a Routing Engine. |
E_ACK_G | The Routing Engine acknowledges giveup of mastership. |
E_CMD_A | The command request chassis routing-engine master acquire was issued from the backup Routing Engine. |
E_CMD_F | Force switchover command was issued. |
E_CMD_R | The command request chassis routing-engine master release was issued from the master Routing Engine. |
E_CMD_S | The command request chassis routing-engine master switch was issued from a Routing Engine. |
E_NO_ORE | No other Routing Engine is detected. |
E_TMOUT | A request timed out. |
E_NO_IPC | Routing Engine connection was lost. |
E_ORE_M | Other Routing Engine state was changed to master. |
E_ORE_B | Other Routing Engine state was changed to backup. |
E_ORE_D | Other Routing Engine state was changed to disabled. |
Sample Output
user@host> show log mastership
Jan 12 21:50:05 clear-log[865]: logfile cleared Jan 12 21:50:18 failed to receive keepalives from other RE for the last 60 sec Jan 12 21:50:23 failed to send RE info/keepalive: errno=22, total=6 in the last 20 sec Jan 12 21:50:23 failed to send RE info/keepalive: errno=22, total=6 in the last 20 sec Jan 12 21:50:34 event = E_CMD_R, state = master , param = 0x0 Jan 12 21:50:34 send "you are the master" request Jan 12 21:50:34 Failed to send RE mastership cmd . err = 65 Jan 12 21:50:34 Currentstate: master NextState:giveup reason_code: 1 Jan 12 21:50:34 timestamp: Wed Jan 12 21:50:34 2000 Jan 12 21:50:34 new state = giveup Jan 12 21:50:36 event = E_TMOUT , state = giveup, param = 0x0 Jan 12 21:50:36 send "you are the master" request Jan 12 21:50:36 Failed to send RE mastership cmd. err = 65 Jan 12 21:50:36 Currentstate: giveup NextState:giveup reason_code: 1 Jan 12 21:50:36 new state = giveup Jan 12 21:50:38 event = E_TMOUT, state = giveup, param = 0x0 Jan 12 21:50:38 send "you are the master" request Jan 12 21:50:38 Failed to send RE mastership cmd. err = 65 Jan 12 21:50:38 Currentstate: giveup NextState:giveup reason_code: 1 Jan 12 21:50:38 new state = giveup Jan 12 21:50:40 failed to receive keepalives from other RE for the last 80 sec Jan 12 21:50:41 event = E_TMOUT, state = giveup, param = 0x0 Jan 12 21:50:41 send "you are the master" request Jan 12 21:50:41 Failed to send RE mastership cmd. err = 65 Jan 12 21:50:41 Currentstate: giveup NextState:giveup reason_code: 1 Jan 12 21:50:41 new state = giveup Jan 12 21:50:43 event = E_TMOUT, state = giveup, param = 0x0 Jan 12 21:50:43 send "you are the master" request Jan 12 21:50:43 Failed to send RE mastership cmd. err = 65 Jan 12 21:50:43 Currentstate: giveup NextState:giveup reason_code: 1 Jan 12 21:50:43 new state = giveup Jan 12 21:50:46 failed to send RE info/keepalive: errno=35, total=7 in the last 20 sec Jan 12 21:50:46 failed to send RE info/keepalive: errno=35, total=7 in the last 20 sec Jan 12 21:50:46 event = E_TMOUT, state = giveup, param = 0x0 Jan 12 21:50:46 send "you are the master" request Jan 12 21:50:46 Failed to send RE mastership cmd. err = 65 Jan 12 21:50:46 Currentstate: giveup NextState:giveup reason_code: 1 Jan 12 21:50:46 new state = giveup Jan 12 21:50:48 event = E_TMOUT, state = giveup, param = 0x0 Jan 12 21:50:48 send "you are the master" request Jan 12 21:50:48 Failed to send RE mastership cmd. err = 65 Jan 12 21:50:48 Currentstate: giveup NextState:giveup reason_code: 1 Jan 12 21:50:48 new state = giveup Jan 12 21:50:50 event = E_TMOUT, state = giveup, param = 0x0 Jan 12 21:50:50 send "you are the master" request Jan 12 21:50:50 Failed to send RE mastership cmd. err = 65 Jan 12 21:50:50 Currentstate: giveup NextState:giveup reason_code: 1 Jan 12 21:50:50 new state = giveup Jan 12 21:50:53 event = E_MAXTRY , state = giveup, param = 0x0 Jan 12 21:50:53 Currentstate: giveup NextState:master reason_code: 1 Jan 12 21:50:53 timestamp: Wed Jan 12 21:50:53 2000 Jan 12 21:50:53 new state = master Jan 12 21:51:01 failed to receive keepalives from other RE for the last 100 sec Jan 12 21:51:06 failed to send RE info/keepalive: errno=65, total=7 in the last 20 sec Jan 12 21:51:06 failed to send RE info/keepalive: errno=65, total=7 in the last 20 sec Jan 12 21:51:21 failed to receive keepalives from other RE for the last 120 sec Jan 12 21:51:26 failed to send RE info/keepalive: errno=22, total=6 in the last 20 sec Jan 12 21:51:26 failed to send RE info/keepalive: errno=22, total=6 in the last 20 sec
Meaning
The beginning of the log shows that keepalives are not being responded to and the state of the Routing Engine changed from master to giveup after the request chassis routing-engine master release command was issued. However, the other Routing Engine is not taking over mastership because it is unreachable. Eventually a timeout (E_TMOUT) occurs until the Routing Engine reaches the maximum number of attempts permitted (E_MAXTRY). The output then shows the Routing Engine state changing from giveup back to master.
The output doesn’t indicate why the mastership switchover did not work. However, it is clear that the backup Routing Engine is unreachable.