Selecting the Best Path

BGP selects only one route to a destination as the best path. When multiple routes to a given destination exist, BGP must determine which of these routes is the best. BGP puts the best path in its routing table and advertises that path to its BGP neighbors.

If only one route exists to a particular destination, BGP installs that route. If multiple routes exist for a destination, BGP uses tie-breaking rules to decide which one of the routes to install in the BGP routing table.

Note: During best path selection on the E series router, BGP takes into account BGP learned routes as well as the routes learned by other protocols to determine if a route should be marked as best and made active. The BGP decision to examine the routes learned by other protocols is driven by administrative distance. If the route learned is associated with a protocol that has a lower administrative distance than BGP, the route preempts the one learned by BGP. At that point, BGP marks its route as inactive and does not install or advertise it. You can use the bgp advertise-inactive command in the router configuration mode to change this default behavior so that such inactive routes are advertised by BGP. For more information about advertising inactive routes, see Advertising Inactive Routes.

BGP Path Decision Algorithm

BGP determines the best path to each destination for a BGP speaker by comparing path attributes according to the following selection sequence:

  1. Select a path with a reachable next hop.
  2. Select the path with the highest weight.
  3. If path weights are the same, select the path with the highest local preference value.
  4. Prefer locally originated routes (network routes, redistributed routes, or aggregated routes) over received routes.
  5. Select the route with the shortest AS-path length.
  6. If all paths have the same AS-path length, select the path based on origin: IGP is preferred over EGP; EGP is preferred over Incomplete.
  7. If the origins are the same, select the path with lowest MED value.
  8. If the paths have the same MED values, select the path learned by means of EBGP over one learned by means of IBGP.
  9. Select the path with the lowest IGP cost to the next hop.
  10. Select the path with the shortest route reflection cluster list. Routes without a cluster list are treated as having a cluster list of length 0.
  11. Select the path received from the peer with the lowest BGP router ID.
  12. Select the path that was learned from the neighbor with the lowest peer remote address.

The following sections discuss the attributes evaluated in the path decision process. Examples show how you might configure these attributes to influence routing decisions.

Configuring Next-Hop Processing

Routes sent by BGP speakers include the next-hop attribute. The next hop is the IP address of a node on the network that is closer to the advertised prefix. Routers that have traffic destined for the advertised prefix send the traffic to the next hop. The next hop can be the address of the BGP speaker sending the update or of a third-party node. The third-party node does not have to be a BGP speaker.

The next-hop attributes conform to the following rules:

Next Hops

If you use the neighbor remote-as command to configure the BGP neighbors, the next hop is passed according to the rules provided above when networks are advertised. Consider the network configuration shown in Figure 28. Router Jackson advertises 192.168.22.0/23 internally to router Memphis with a next hop of 10.2.2.1. Router Jackson advertises the same network externally to router Topeka with a next hop of 10.1.13.1.

Figure 28: Configuring Next-Hop Processing

Configuring Next-Hop Processing

Router Memphis advertises 172.24.160/19 with a next hop of 10.2.2.2 to router Jackson. Router Jackson advertises this same network externally to router Topeka with a next hop of 10.1.13.1.

Router Topeka advertises network 192.168.32.0/19 with a next hop of 10.1.13.2 to router Jackson. Because this network originates outside AS 604, router Jackson then internally advertises this network (192.168.32.0/19) to router Memphis with the same next hop, 10.1.13.2 (the IP address of the external BGP speaker that advertised the route).

When router Memphis has traffic destined for 192.168.32.0/19, it must be able to reach the next hop by means of an IGP, because it has no direct connection to 10.1.13.2. Otherwise, router Memphis will drop packets destined for 192.168.32.0/19 because the next-hop address is not accessible. Router Memphis does a lookup in its IP routing table to determine how to reach 10.1.13.2:

Destination

Next Hop

10.1.13.0/24

10.2.2.1

The next hop is reachable through router Jackson, and the traffic can be forwarded.

The following commands configure the routers as shown in Figure 28:

To configure router Jackson:

host1(config)#router bgp 604 host1(config-router)#neighbor 10.1.13.2 remote-as 25 host1(config-router)#neighbor 10.2.2.2 remote-as 604 host1(config-router)#network 192.168.22.0 mask 255.255.254.0

To configure router Memphis:

host2(config)#router bgp 604 host2(config-router)#neighbor 10.2.2.1 remote-as 604 host2(config-router)#network 172.24.160.0 mask 255.255.224.0

To configure router Topeka:

host3(config)#router bgp 25 host3(config-router)#neighbor 10.1.13.1 remote-as 604 host3(config-router)#network 172.31.64.0 mask 255.255.192.0

Additional configuration is required for routers Biloxi, Memphis, and Jackson; the details depend on the IGP running in AS 604.

neighbor remote-as

Next-Hop-Self

In some circumstances, using a third-party next hop causes routing problems. These configurations typically involve nonbroadcast multiaccess (NBMA) media. To better understand this situation, first consider a broadcast multiaccess (BMA) media network, as shown in Figure 29.

Figure 29: Next-Hop Behavior for Broadcast Multiaccess Media

Next-Hop Behavior for Broadcast Multiaccess
Media

Routers Toledo, Madrid, and Barcelona are all on the same Ethernet network, which has a prefix of 10.19.7.0/24. When router Toledo advertises prefix 192.168.22.0/23 to router Madrid, it sets the next-hop attribute to 10.19.7.5. Before router Madrid advertises this prefix to router Barcelona, it sees that its own IP address, 10.19.7.7, is on the same subnet as the next hop for the advertised prefix. If router Barcelona can reach router Madrid, then it should be able to reach router Toledo. Router Madrid therefore advertises 192.168.22.0/23 to router Barcelona with a next-hop attribute of 10.19.7.5.

Now consider Figure 30, which shows the same routers on a Frame Relay—NBMA—network.

Figure 30: Next-Hop Behavior for Nonbroadcast Multiaccess Media

Next-Hop Behavior for Nonbroadcast Multiaccess
Media

Routers Toledo and Madrid are EBGP peers, as are routers Madrid and Barcelona. When router Toledo advertises prefix 192.168.22.0/23 to router Madrid, router Madrid makes the same comparison as in the BMA example, and leaves the next-hop attribute intact when it advertises the prefix to router Barcelona. However, router Barcelona will not be able to forward traffic to 192.168.22.0/23, because it does not have a direct PVC connection to router Toledo and cannot reach the next hop of 10.19.7.5.

You can use the neighbor next-hop-self command to correct this routing problem. If you use this command to configure router Madrid, the third-party next hop advertised by router Toledo is not advertised to router Barcelona. Instead, router Madrid advertises 192.168.22.0/23 with the next-hop attribute set to its own IP address, 10.19.7.7. Router Barcelona now forwards traffic destined for 192.168.22.0/23 to the next hop, 10.19.7.7. Router Madrid then passes the traffic along to router Toledo.

To disable third-party next-hop processing, configure router Madrid as follows:

host1(config)#router bgp 319 host1(config-router)#neighbor 10.19.7.8 remote-as 211 host1(config-router)#neighbor 10.19.7.8 next-hop-self

neighbor next-hop-self

Assigning a Weight to a Route

You can assign a weight to a route when more than one route exists to the same destination. A weight indicates a preference for that particular route over the other routes to that destination. The higher the assigned weight, the more preferred the route. By default, the route weight is 32768 for paths originated by the router, and 0 for other paths.

In the configuration shown in Figure 31, routers Boston and NY both learn about network 192.68.5.0/24 from AS 200. Routers Boston and NY both propagate the route to router LA. Router LA now has two routes for reaching 192.68.5.0/24 and must decide the appropriate route. If you prefer that router LA direct traffic through router Boston, you can configure router LA so that the weight of routes coming from router Boston are higher—more preferred—than the routes coming from router NY. Router LA subsequently prefers routes received from router Boston and therefore uses router Boston as the next hop to reach network 192.68.5.0/24.

Figure 31: Assigning a Weight to a Neighbor Connection

Assigning a Weight to a Neighbor Connection

You can use any of the following three ways to set the weights in routes coming in from router Boston:

Using the neighbor weight Command

The following commands assign a weight of 1000 to all routes router LA receives from AS 100 and assign a weight of 500 to all routes router LA receives from AS 300:

host1(config)#router bgp 400 host1(config-router)#neighbor 10.5.5.1 remote-as 100 host1(config-router)#neighbor 10.5.5.1 weight 1000 host1(config-router)#neighbor 10.72.4.2 remote-as 300 host1(config-router)#neighbor 10.72.4.2 weight 500

Router LA sends traffic through router Boston in preference to router NY.

Using a Route Map

A route map instance is a set of conditions with an assigned number. The number after the permit keyword designates an instance of a route map. For example, instance 10 of route map 10 begins with the following:

host1(config)#route-map 10 permit 10

In the following commands to configure router LA, instance 10 of route map 10 assigns a weight of 1000 to any routes from AS 100. Instance 20 assigns a weight of 500 to routes from any other AS.

host1(config)#router bgp 400 host1(config-router)#neighbor 10.5.5.1 remote-as 100 host1(config-router)#neighbor 10.5.5.1 route-map 10 in host1(config-router)#neighbor 10.72.4.2 remote-as 300 host1(config-router)#neighbor 10.72.4.2 route-map 20 in host1(config-router)#exit host1(config)#route-map 10 host1(config-route-map)#set weight 1000 host1(config-route-map)#route-map 20 host1(config-route-map)#set weight 500

See JunosE IP Services Configuration Guide for more information about using route maps.

Using an AS-Path Access List

The following commands assign weights to routes filtered by AS-path access lists on router LA:

host1(config)#router bgp 400 host1(config-router)#neighbor 10.5.5.1 remote-as 100 host1(config-router)#neighbor 10.5.5.1 filter-list 1 weight 1000 host1(config-router)#neighbor 10.72.4.2 remote-as 300 host1(config-router)#neighbor 10.72.4.2 filter-list 2 weight 500 host1(config-router)#exit host1(config)#ip as-path access-list 1 permit ^100_ host1(config)#ip as-path access-list 2 permit ^300_

Access list 1 permits any route whose AS-path attribute begins with 100 (specified by ^). This permits routes that pass through router Boston, whether they originate in AS 100 (AS path = 100) or AS 200 (AS path = 100 200) or AS 300 (AS path = 100 200 300). Access list 2 permits any route whose AS-path attribute begins with 300. This permits routes that pass through router NY, whether they originate in AS 300 (AS path = 300) or AS 200 (AS path = 300 200) or AS 100 (AS path = 300 200 100).

The neighbor filter-list commands assign a weight attribute of 1000 to routes passing through router Boston and a weight attribute of 500 to routes passing through router NY. Regardless of the origin of the route, routes learned through router Boston are preferred.

ip as-path access-list

neighbor filter-list

neighbor weight

See Access Lists for more information about using access lists.

Configuring the Local-Pref Attribute

The local-pref attribute specifies the preferred path among multiple paths to the same destination. The preferred path is the one with the higher preference value. Local preference is used only within an AS, to select an exit point.

To configure the local preference of a BGP path, you can do one of the following:

Using the bgp default local-preference Command

In Figure 32, AS 873 receives updates for network 192.168.5.0/24 from AS 32 and AS 17.

Figure 32: Configuring the Local-Preference Attribute

Configuring the Local-Preference Attribute

The following commands configure router LA:

host1(config-router)#router bgp 873 host1(config-router)#neighbor 10.72.4.2 remote-as 32 host1(config-router)#neighbor 10.2.2.4 remote-as 873 host1(config-router)#bgp default local-preference 125

The following commands configure router SanJose:

host2(config-router)#router bgp 873 host2(config-router)#neighbor 10.5.5.1 remote-as 17 host2(config-router)#neighbor 10.2.2.3 remote-as 873 host2(config-router)#bgp default local-preference 200

Router LA sets the local preference for all updates from AS 32 to 125. Router SanJose sets the local preference for all updates from AS 17 to 200. Because router LA and router SanJose exchange local preference information within AS 873, they both recognize that routes to network 192.168.5.0/24 in AS 293 have a higher local preference when they come to AS 873 from AS 17 than when they come from AS 32. As a result, both router LA and router SanJose prefer to reach this network through router Boston in AS 17.

bgp default local-preference

Using a Route Map to Set the Local Preference

When you use a route map to set the local preference you have more flexibility in selecting routes for which you can set a local preference based on many criteria, including AS. In the previous section, all updates received by router SanJose were set to a local preference of 200.

Using a route map, you can specifically assign a local preference for routes from AS 17 that pass through AS 293.

The following commands configure router SanJose.

host2(config-router)#router bgp 873 host2(config-router)#neighbor 10.2.2.3 remote-as 873 host2(config-router)#neighbor 10.5.5.1 remote-as 17 host2(config-router)#neighbor 10.5.5.1 route-map 10 in host2(config-router)#exit host2(config)#ip as-path access-list 1 permit ^17 293$ host2(config)#route-map 10 permit 10 host2(config-route-map)#match as-path 1 host2(config-route-map)#set local-preference 200 host2(config-route-map)#exit host2(config)#route-map 10 permit 20

Router SanJose sets the local-pref attributes to 200 for routes originating in AS 293 and passing last through AS 17. All other routes are accepted (as defined in instance 20 of the route map 10), but their local preference remains at the default value of 100, indicating a less-preferred path.

Understanding the Origin Attribute

BGP uses the origin attribute to describe how a route was learned at the origin—the point where the route was injected into BGP. The origin of the route can be one of three values:

Consider the sample topology shown in Figure 33. Because routers Albany and Boston are not directly connected, they learn the path to each other by means of an IGP (not illustrated).

The following commands configure router Boston:

host1(config)#ip route 172.31.125.100 255.255.255.252 host1(config)#router bgp 100 host1(config-router)#neighbor 10.2.25.1 remote-as 100 host1(config-router)#neighbor 10.4.4.1 remote-as 100 host1(config-router)#neighbor 10.3.3.1 remote-as 300 host1(config-router)#network 172.19.0.0 host1(config-router)#redistribute static

The following commands configure router NY:

host2(config)#router bgp 100 host2(config-router)#neighbor 10.4.4.1 remote-as 100 host2(config-router)#neighbor 10.2.25.2 remote-as 100 host2(config-router)#network 172.28.8.0 mask 255.255.248.0

The following commands configure router Albany:

host3(config)#router bgp 100 host3(config-router)#neighbor 10.4.4.2 remote-as 100 host3(config-router)#neighbor 10.2.25.2 remote-as 100 host3(config-router)#network 192.168.33.0 mask 255.255.255.0

The following commands configure router LA:

host4(config)#router bgp 300 host4(config-router)#neighbor 10.3.3.2 remote-as 100 host4(config-router)#network 192.168.204.0 mask 255.255.252.0 host4(config-router)#redistribute isis

Consider how route 172.21.10.0/23 is passed along to the routers in Figure 33:

  1. IS-IS injects route 172.21.10.0/23 from router Chicago into BGP on router LA. BGP sets the origin attribute to Incomplete (because it is a redistributed route) to indicate how BGP originally became aware of the route.
  2. Router Boston learns about route 172.21.10.0/23 by means of EBGP from router LA.
  3. Router NY learns about route 172.21.10.0/23 by means of IBGP from router Boston.

The value of the origin attribute for a given route remains the same, regardless of where you examine it. Table 20 shows this for all the routes known to routers NY and LA.

Table 20: Origin and AS Path for Routes Viewed on Different Routers

Route

Router

Origin

AS Path

192.168.204.0/22

Albany

IGP

300

192.168.204.0/22

Boston

IGP

300

192.168.204.0/22

NY

IGP

300

192.168.204.0/22

LA

IGP

empty

172.21.10.0/23

Albany

Incomplete

300

172.21.10.0/23

Boston

Incomplete

300

172.21.10.0/23

NY

Incomplete

300

172.21.10.0/23

LA

Incomplete

empty

172.28.8.0/21

Albany

IGP

empty

172.28.8.0/21

Boston

IGP

empty

172.28.8.0/21

NY

IGP

empty

172.28.8.0/21

LA

IGP

100

172.31.125.100

Albany

Incomplete

empty

172.31.125.100

Boston

Incomplete

empty

172.31.125.100

NY

Incomplete

empty

172.31.125.100

LA

Incomplete

100

172.19.0.0/16

Albany

IGP

empty

172.19.0.0/16

Boston

IGP

empty

172.19.0.0/16

NY

IGP

empty

172.19.0.0/16

LA

IGP

100

192.168.330/24

Albany

IGP

empty

192.168.330/24

Boston

IGP

empty

192.168.330/24

NY

IGP

empty

192.168.330/24

LA

IGP

100

As a matter of routing policy, you can specify an origin for a route with a set origin clause in a redistribution route map. Changing the origin enables you to influence which of several routes for the same destination prefix is selected as the best route. In practice, changing the origin is rarely done.

Understanding the AS-Path Attribute

The AS-path attribute is a list of the ASs through which a route has passed. Whenever a route enters an AS, BGP prepends the AS number to the AS-path attribute. This feature enables network operators to track routes, but it also enables the detection and prevention of routing loops.

Consider the following sequence of events for the routers shown in Figure 34:

  1. Route 172.21.10.0/23 is injected into BGP by means of router London in AS 47.
  2. Suppose router London advertises that route to router Paris in AS 621. As received by router Paris, the AS-path attribute for route 172.21.10.0/23 is 47.
  3. Router Paris advertises the route to router Berlin in AS 11. As received by router Berlin, the AS-path attribute for route 172.21.10.0/23 is 621 47.
  4. Router Berlin advertises the route to router London in AS 47. As received by router London, the AS-path attribute for route 172.21.10.0/23 is 11 621 47.

    Figure 34: AS-Path Attributes

    AS-Path Attributes

A routing loop exists if router London accepts the route from router Berlin. Router London can choose not to accept the route from router Berlin because it recognizes from the AS-path attribute (11 621 47) that the route originated in its own AS 47.

As a matter of routing policy, you can prepend additional AS numbers to the AS-path attribute for a route with a set as-path prepend clause in an outbound route map. Changing the AS path enables you to influence which of several routes for the same destination prefix is selected as the best route.

Configuring a Local AS

You can change the local AS of a BGP peer or peer group within the current address family with the neighbor local-as command. By using different local AS numbers for different peers, you can avoid or postpone AS renumbering in the event the ASs are merged.

neighbor local-as

The following example commands change the local AS number for peer 104.4.2 from the global local AS of 100 to 32:

host1(config)#router bgp 100 host1(config-router)#address-family ipv4 unicast vrf boston host1(config-router)#neighbor 10.4.4.2 remote-as 645 host1(config-router)#neighbor 10.4.4.2 local-as 32

Configuring the MED Attribute

If two ASs connect to each other in more than one place, one link or path might be a better choice to reach a particular prefix within or behind one of the ASs. The MED value is a metric expressing a degree of preference for a particular path. Lower MED values are preferred.

Whereas the Local Preference attribute is used only within an AS (to select an exit point), the MED attribute is exchanged between ASs. A router in one AS sends the MED to inform a router in another AS which path the second router should use to reach particular destinations. If you are the administrator of the second AS, you must therefore trust that the router in the first AS is providing information that is truly beneficial to your AS.

You configure the MED on the sending router by using the set metric command in an outbound route map. Unless configured otherwise, a receiving router compares MED attributes only for paths from external neighbors that are members of the same AS. If you want MED attributes from neighbors in different ASs to be compared, you must issue the bgp always-compare-med command.

In Figure 35, router London in AS 303 can reach 192.168.33.0/24 in AS 73 through router Paris or through router Nice to router Paris.

Figure 35: Configuring the MED

Configuring the MED

The following commands configure router London:

host1(config)#router bgp 303 host1(config-router)#neighbor 10.4.4.2 remote-as 73 host1(config-router)#neighbor 10.3.3.2 remote-as 73 host1(config-router)#neighbor 10.5.5.2 remote-as 4 host1(config-router)#network 122.28.8.0 mask 255.255.248.0

The following commands configure router Paris:

host2(config)#router bgp 73 host2(config-router)#neighbor 10.4.4.1 remote-as 303 host2(config-router)#neighbor 10.4.4.1 route-map 10 out host2(config-router)#neighbor 10.2.25.1 remote-as 73 host2(config-router)#neighbor 10.6.6.1 remote-as 4 host2(config-router)#neighbor 10.6.6.1 route-map 10 out host2(config-router)#network 192.168.33.0 mask 255.255.255.0 host2(config-router)#exit host2(config)#route-map 10 permit 10 host2(config-route-map)#set metric 50

The following commands configure router Nice:

host3(config)#router bgp 73 host3(config-router)#neighbor 10.3.3.1 remote-as 303 host3(config-router)#neighbor 10.3.3.1 route-map 10 out host3(config-router)#neighbor 10.2.25.2 remote-as 73 host3(config-router)#network 172.19.0.0 host3(config-router)#exit host3(config)#route-map 10 permit 10 host3(config-route-map)#set metric 100

The following commands configure router Dublin:

host4(config)#router bgp 4 host4(config-router)#neighbor 10.5.5.1 remote-as 303 host4(config-router)#neighbor 10.5.5.1 route-map 10 out host4(config-router)#neighbor 10.6.6.2 remote-as 73 host4(config-router)#network 172.14.27.0 mask 255.255.255.0 host4(config-router)#exit host4(config)#route-map 10 permit 10 host4(config-route-map)#set metric 25

Router London receives updates regarding route 192.168.33.0/24 from both router Nice and router Paris. Router London compares the MED values received from the two routers: Router Nice advertises a MED of 100 for the route, whereas router Paris advertises a MED of 50. On this basis, router London prefers the path through router Paris.

Because BGP by default compares only MED attributes of routes coming from the same AS, router London can compare only the MED attributes for route 192.168.33.0/24 that it received from routers Paris and Nice. It cannot compare the MED received from router Dublin, because router Dublin is in a different AS than routers Paris and Nice.

However, you can use the bgp always-compare-med command to configure router London to take into account the MED attribute from router Dublin as follows:

host1(config)#router bgp 303 host1(config-router)#neighbor 10.4.4.2 remote-as 73 host1(config-router)#neighbor 10.3.3.2 remote-as 73 host1(config-router)#neighbor 10.5.5.2 remote-as 4 host1(config-router)#network 122.28.8.0 mask 255.255.248.0 host1(config-router)#bgp always-compare-med

Router Dublin advertises a MED of 25 for route 192.168.33.0/24, which is lower—more preferred—than the MED advertised by router Paris or router Nice. However, the AS path for the route through router Dublin is longer than that through router Paris. The AS path is the same length for router Paris and router Nice, but the MED advertised by router Paris is lower than that advertised by router Nice. Consequently, router London prefers the path through router Paris.

Suppose, however that router Dublin was not configured to set the MED for route 192.168.33.0/24 in its outbound route map 10. Would router London receive a MED of 50 passed along by router Paris through router Dublin? No, because the MED attribute is nontransitive. Router Dublin does not transmit any MED that it receives. A MED is only of value to a direct peer.

bgp always-compare-med

set metric

Missing MED Values

By default, a route that arrives with no MED value is treated as if it had a MED of 0, the most preferred value. You can use the bgp bestpath missing-as-worst command to specify that a route with any MED value is always preferred to a route that is missing the MED value.

bgp bestpath missing-as-worst

Comparing MED Values Within a Confederation

A BGP speaker within a confederation of sub-ASs might need to compare routes to determine the best path to a destination. By default, BGP does not use the MED value when comparing routes originated in different sub-ASs within the confederation to which the BGP speaker belongs. (Within the confederation, routes learned from different sub-ASs are treated as having originated in different places.) You can use the bgp bestpath med confed command to force MED values to be taken into account within a confederation.

bgp bestpath med confed

Suppose a BGP speaker has three routes to prefix 10.10.0.0/16:

BGP compares these routes to each other to determine the best path to the prefix. If you have issued the bgp bestpath med confed command, BGP takes into account the MED when comparing Route 1 with Route 2. However, BGP does not take into account the MED when comparing Route 3 with either Route 1 or Route 2, because Route 3 originates outside the confederation.

Capability Negotiation

The router accepts connections from peers that perform capability negotiation. Capabilities are negotiated by means of the open messages that are exchanged when the session is established. The router supports the following capabilities:

The router advertises these capabilities—except for the cooperative route filtering capability—by default. You can prevent the advertisement of specific capabilities with the no neighbor capability command. You can also use this command to prevent all capability negotiation with the specified peer.

Note: The graceful restart capability is controlled with the bgp graceful-restart and neighbor graceful-restart commands rather than the neighbor capability command. However, the no neighbor capability command will prevent negotiation of the graceful restart capability.

Cooperative Route Filtering

The cooperative route filtering capability—also referred to as outbound route filtering (ORF)—enables a BGP speaker to send an inbound route filter to a peer and have the peer install it as an outbound filter on the remote end of the session.

You must specify both the type of inbound filter (ORF type) and the direction of ORF capability. The router currently supports prefix-lists as the inbound filter sent by the BGP speaker. The inbound filter sent by the BGP speaker can be a prefix list or a Cisco proprietary prefix list. The BGP speaker must indicate whether it will send inbound filters to peers, accept inbound filters from peers, or both. The router supports both standard and Cisco-proprietary orf messages.

Dynamic Capability Negotiation

If both peers acknowledge support of dynamic capability negotiation, then at any subsequent point after the session is established, either peer can send a capabilities message to the other indicating a desire to negotiate another capability or to remove a previously negotiated capability.

The data field of the capability message contains a list of all the capabilities that can be dynamically negotiated. In earlier versions, now deprecated, the data field did not carry this information. Use the dynamic-capability-negotiation keyword to include the list. Use the deprecated-dynamic-capability-negotiation keyword to exclude sending the list.

Nondynamic capability negotiation is supported for the cooperative route filtering, four-octet AS numbers, deprecated dynamic capability negotiation, and dynamic capability negotiation capabilities. Dynamic capability negotiation of these capabilities is not supported.

If both sides of the connection advertise support for the new dynamic capability negotiation capability, then the peers negotiate which capabilities are dynamic and which are not.

If both sides of the connection advertise support only for the deprecated dynamic capability negotiation, then the BGP speaker uses dynamic capability negotiation for all capabilities that allow it without attempting to negotiate this with the peer.

Four-Octet AS Numbers

BGP speakers that support four-octet AS and sub-AS numbers are sometimes referred to as “new” speakers. The four-octet AS numbers are employed by the AS-path and aggregator attributes. “Old” speakers are those that do not support the four-octet numbers.

Two new transitional optional attributes, new-as-path and new-aggregator, are used to carry the four-octet numbers across the old speakers. A new speaker communicating with an old speaker will send the new attributes with the four-octet numbers for locally-originated and propagated routes. The old speaker propagates the new attributes for received routes. The new speaker also sends the AS-path and aggregator attributes with two-octet numbers; any AS number greater than 65535 is replaced with a reserved AS number, 23456.

Graceful Restarts

When BGP restarts on a router, all of the router’s BGP peers detect that the BGP session transitioned from up to down. The transition causes a routing flap throughout the network as the peers recalculate their best routes in light of the loss of routes from that peering session.

The BGP graceful restart capability reduces the network disruption that normally results from a peer session going down. If the session is with a peer that had previously advertised the graceful restart capability, the receiving BGP speaker marks all routes from that peer in the BGP routing table as stale. BGP keeps these stale routes for a limited time and continues to use these routes to forward traffic. Any existing stale routes from that peer are deleted to account for consecutive restarts.

When the restarting peer reestablishes the session, the receiving BGP speaker replaces the stale routes with the fresh routes it receives from the peer. The restarted peer sends an End-of-RIB marker to signal when it has finished sending all its routes to the BGP speaker. Until this point, BGP has still been using the stale routes to forward traffic. Upon receipt of the End-of-RIB marker, the BGP speaker flushes any remaining stale routes from the restarted peer.

The End-of-RIB marker is an update message that contains no advertised or withdrawn prefixes; it is sent only to BGP speakers that have previously advertised the graceful restart capability.

The receiving speaker also sends its own routes to the restarted speaker, and sends an End-of-RIB marker when it completes the update. The restarted peer defers reinitiating the BGP best-path selection process until it has received this marker from all peers with which it had a session in the established state and from which it had received an End-of-RIB marker before it restarted.

After running the selection process to pick the best route to all prefixes using the fresh routes, BGP then installs the best routes in the IP routing table on the restarted peer. Any of these that are best overall routes to a prefix are then pushed by the router to the forwarding tables on the line modules.

By waiting for all restarted peers to send the End-of-RIB marker, BGP risks delaying the initiation of the best path decision process indefinitely due to a single very slow peer. For a specific peer, you can avoid this delay by hard clearing the peer or issuing the clear ip bgp wait-end-of-rib command. Either method removes that peer from the set of peers for which BGP is awaiting an End-of-RIB marker. Alternatively, you can minimize this effect by using the bgp graceful-restart path-selection-defer-time-limit command to specify a maximum period that the restarted peer waits for the marker from its peers.

Note that the receiving peer does not defer its best-path selection process while waiting for a restarted peer to reestablish a session. The receiving peer continues to use the stale routes from the restarted peer in the decision process. When it flushes stale routes, the receiving peer then uses the freshly updated routes.

A restarting peer must bring the session back up and refresh its routes within a limited period, or BGP on the receiving peer will flush all the stale routes. When a BGP speaker advertises the graceful restart capability, it also advertises how long it expects to take to reestablish a session if it restarts. If the session is not reestablished within this restart period, the speaker’s peers flush the stale routes from the speaker. You can use the bgp graceful-restart restart-time command to modify the restart period advertised to all peers; the neighbor graceful-restart restart-time command modifies the restart period advertised to specific peers or peer groups. A receiving peer starts the timer as soon as it recognizes that the session with the restarting peer has transitioned to down.

The receiving peer also has a configurable timer that starts when it recognizes that the session with the restarting peer has gone down. The bgp graceful-restart stalepaths-time command determines how long a receiving peer is willing to use stale paths from any restarted peer; the neighbor graceful-restart stalepaths-time command does the same for a specified restarted peer or peer group. If the receiving peer does not receive an End-of-RIB marker from the restarted peer before the stalepaths timer expires, the receiving peer flushes all stale routes from the peer.

In this release, BGP supports the graceful restart capability to inform peers that the forwarding state for IPv6 address families, namely unicast, multicast, VPN unicast, and unicast labeled subsequent address family identifiers (SAFIs), can be preserved during a stateful SRP switchover. MPLS also provides high availability support for IPv6 by preserving the MPLS state for IPv6 interfaces during a stateful SRP switchover. This capability of MPLS enables BGP to support graceful restart for IPv6 labeled address families. During a restart, BGP acts as a restarting speaker for the IPv6 unlabeled and labeled address-families. The function of BGP as a graceful restart helper for both IPv4 and IPv6 address families had been available in lower-numbered releases, and there is no change to this functionality in this release.

Note: The function of BGP as a graceful restart helper for both IPv4 and IPv6 address families had been available in lower-numbered releases, and there is no change to this functionality in this release.

bgp graceful-restart

bgp graceful-restart path-selection-defer-time-limit

bgp graceful-restart restart-time

bgp graceful-restart stalepaths-time

clear ip bgp wait-end-of-rib

neighbor graceful-restart

neighbor graceful-restart restart-time

neighbor graceful-restart stalepaths-time

Configuring Hold Timers for Successful Graceful Restart in Scaled Scenarios

In a scaled environment, we recommend that you increase the hold timers for the following protocols to appropriate values, based on the level of complexity of the network and scaling settings, so as to enable graceful restart to be completed successfully.

Consider a scenario in which a provider edge router, PE1, at one side of the service provider core is connected to a provider core router, P, which is a label-switched router (LSR) that carries traffic for the VPN tunnel. The core router, P, is connected to another provider edge router, PE2, which provides egress from the VPN. Both PE1 and PE2 routers communicate with customer sites through a direct connection to a customer edge (CE) device that sits at the edge of the customer site.

PE1 is configured for graceful restart and PE2 functions as the helper node, and each PE router is configured with 1500 VRFs and 1500 adjacencies. In such an environment, you need to perform the following steps:

  1. On PE1, which is the restarting router, use the hello hold-time command in LDP Profile Configuration mode to modify the period for which an LSR maintains link hello records before another link hello is sent as 90 seconds.
    host1(config)#mpls ldp interface profile ldp1 host1(config-ldp)#hello hold-time 90
  2. On the interface that connects PE1 to the core router, P, use the isis hello-interval command in Interface Configuration mode to set the frequency at which the router sends hello packets on the specified interface as 30 seconds.
    host1(config-if)#isis hello-interval 30
  3. On PE2, which is the helper router, use the bgp graceful-restart stalepaths-time command in Router Configuration mode to set the maximum time BGP waits to receive an End-of-RIB marker from any restarted peer before flushing all remaining stale routes from that peer as 3600 seconds.
    host1(config-router)#bgp graceful-restart stalepaths-time 3600

This condition can occur even in environments that are not scaled to the maximum limits and contain minimal subscriber connections or attribute definitions.

We recommend that you perform IS-IS graceful restart only with point-to-point adjacencies because of certain limitations that exist with graceful restart support for LAN interfaces. IS-IS graceful restart (nonstop forwarding) does not work on the broadcast interface when the restarting router is the designated intermediate system (DIS). Graceful restart works properly when the restarting router is not the DIS.

Route Refresh

If the router detects that a peer supports both Cisco-proprietary and standard route refresh messages, it uses the standard route refresh messages.

neighbor capability