Vinit's Tech Blog

Life has meaning as long as you keep learning.

BGP Persistent Route Oscillation - Type 1 Churn with Route Reflectors

Routing loops can common in both L2 and L3 domains. A loop could occur when an error occurs in the operating of the routing algorithm, which could be result of a certain configuration and design leading the routing algorithm to fail. A routing loop can occur at random intervals which are based on events and some routing loops are persistent which remain until a config change fixes the routing algorithm. In this blog post, we are going to discuss about a persistent route oscillation scenario caused in a BGP environment with Route-Reflectors and when Multi-Exit-Disc (MED) is involved.

In BGP, the best path selection algorithm involves evaluation of BGP attributes in a particular order as defined in RFC 4721. MED comes after Weight, Local-Preference, Locally originated prefixes, AS_Path and Origin Type in the listed order. MED provides a dynamic way to influence another AS in the way to reach a certain route when there are multiple entry points for that AS. BGP follows a systematic procedure for choosing the best path. BGP MEDs function correctly in many scenarios but can raise a number of issues when used in complex topologies. Some of these corner cases are discussed in RFC 3345 (BGP Persistent Route Oscillation Condition). The RFC 3345 talks about Type 1 and Type 2 churn. In this blog post, we are going to focus on the understanding of Type 1 churn in Route-Reflector environment.

In this topology, AS100 is divided into two RR clusters, where R1 is RR with cluster-id 0.0.0.1 and R4 is the RR with cluster-id 0.0.0.2. Router R10 is advertising a prefix 100.100.100.100/32 to both AS 200 and AS300 which are then advertising the prefix into AS100. R7 (AS200) advertises the prefix with BGP metric 10 (MED 10) to R2, R6 (AS300) advertises the prefix to R3 and R5 with the metric value of 1 and 0 respectively.

In order to simulate the BGP persistent route oscillation (Type 1), few tweeks are required in the design. From RR router R1, IGP cost to R2 is 5, IGP cost to R3 is 4 and from RR router R4, IGP cost to R5 is 12. The IGP cost between R1 and R4 is 1. Lets now have a look at the configuration:

R1
interface Loopback0
ip address 192.168.1.1 255.255.255.255
ip ospf 100 area 0
!
interface Ethernet0/0
ip address 10.1.2.1 255.255.255.0
ip ospf network point-to-point
ip ospf 100 area 0
ip ospf cost 5
!
interface Ethernet0/1
ip address 10.1.3.1 255.255.255.0
ip ospf network point-to-point
ip ospf 100 area 0
ip ospf cost 4
!
interface Ethernet0/2
ip address 10.1.4.1 255.255.255.0
ip ospf network point-to-point
ip ospf 100 area 0
ip ospf cost 1
!
router ospf 100
router-id 192.168.1.1
!
router bgp 100
bgp router-id 192.168.1.1
bgp cluster-id 0.0.0.1
bgp log-neighbor-changes
bgp deterministic-med
no bgp default ipv4-unicast
neighbor 192.168.2.2 remote-as 100
neighbor 192.168.2.2 update-source Loopback0
neighbor 192.168.3.3 remote-as 100
neighbor 192.168.3.3 update-source Loopback0
neighbor 192.168.4.4 remote-as 100
neighbor 192.168.4.4 update-source Loopback0
!
address-family ipv4
neighbor 192.168.2.2 activate
neighbor 192.168.2.2 route-reflector-client
neighbor 192.168.2.2 soft-reconfiguration inbound
neighbor 192.168.3.3 activate
neighbor 192.168.3.3 route-reflector-client
neighbor 192.168.3.3 soft-reconfiguration inbound
neighbor 192.168.4.4 activate
neighbor 192.168.4.4 soft-reconfiguration inbound
exit-address-family
!

R4
interface Loopback0
ip address 192.168.4.4 255.255.255.255
ip ospf 100 area 0
!
interface Ethernet0/0
ip address 10.4.5.4 255.255.255.0
ip ospf network point-to-point
ip ospf 100 area 0
ip ospf cost 12
!
interface Ethernet0/2
ip address 10.1.4.4 255.255.255.0
ip ospf network point-to-point
ip ospf 100 area 0
!
router ospf 100
router-id 192.168.4.4
!
router bgp 100
bgp router-id 192.168.4.4
bgp cluster-id 0.0.0.2
bgp log-neighbor-changes
bgp deterministic-med
no bgp default ipv4-unicast
neighbor 192.168.1.1 remote-as 100
neighbor 192.168.1.1 update-source Loopback0
neighbor 192.168.5.5 remote-as 100
neighbor 192.168.5.5 update-source Loopback0
!
address-family ipv4
neighbor 192.168.1.1 activate
neighbor 192.168.1.1 soft-reconfiguration inbound
neighbor 192.168.5.5 activate
neighbor 192.168.5.5 route-reflector-client
neighbor 192.168.5.5 soft-reconfiguration inbound
exit-address-family

Before understanding the whole persistent route oscillation in this scenario, it is important to understand BGP Deterministic-MED and BGP always-compare-med features. The bgp deterministic-med command ensures the comparison of MED attribute when choosing route advertised by different peers in the same AS, where as bgp always-compare-med command ensures the comparison of the MED attribute for paths from neighbors in different ASes. Lets start by looking at the BGP prefix on R2, R3 and R5.

R2
R2#show bgp ipv4 uni 100.100.100.100/32
BGP routing table entry for 100.100.100.100/32, version 2
Paths: (1 available, best #1, table default)
Advertised to update-groups:
2
Refresh Epoch 1
200 1000
172.16.27.7 from 172.16.27.7 (192.168.7.7)
Origin IGP, metric 10, localpref 100, valid, external, best
rx pathid: 0, tx pathid: 0x0
Updated on Jul 14 2019 12:30:25 PDT

R3
R3#show bgp ipv4 uni 100.100.100.100/32
BGP routing table entry for 100.100.100.100/32, version 3
Paths: (1 available, best #1, table default)
Advertised to update-groups:
1
Refresh Epoch 1
200 1000
Refresh Epoch 1
300 1000
172.16.36.6 from 172.16.36.6 (192.168.6.6)
Origin IGP, metric 1, localpref 100, valid, external, best
rx pathid: 0, tx pathid: 0x0
Updated on Jul 14 2019 12:30:28 PDT

R5
R5#show bgp ipv4 uni 100.100.100.100/32
BGP routing table entry for 100.100.100.100/32, version 3
Paths: (1 available, best #1, table default)
Advertised to update-groups:
2
Refresh Epoch 1
300 1000
172.16.56.6 from 172.16.56.6 (192.168.6.6)
Origin IGP, localpref 100, valid, external, best
rx pathid: 0, tx pathid: 0x0
Updated on Jul 14 2019 12:30:28 PDT

The above outputs from before the prefixes are advertised to the RR and a Best Path selection kicks in. Lets now start understanding BGP path selection process between the two clusters in AS100.

Step 1: BGP Prefix 100.100.100.100/32 is advertised by both R2 and R3 to RR router R1. When the prefix is received on RR router R1, R1 chooses the prefix received from R3 as the best due to the better IGP metric to the Next-Hop to reach the destination.

R1#show ip bgp 100.100.100.100
BGP routing table entry for 100.100.100.100/32, version 2
BGP Bestpath: deterministic-med
Paths: (2 available, best #1, table default)
Advertised to update-groups:
1
Refresh Epoch 1
300 1000, (Received from a RR-client), (received & used)
192.168.3.3 (metric 5) from 192.168.3.3 (192.168.3.3)
Origin IGP, metric 1, localpref 100, valid, internal, best
rx pathid: 0, tx pathid: 0x0
Updated on Jul 14 2019 18:01:31 PDT
Refresh Epoch 1
200 1000, (Received from a RR-client), (received & used)
192.168.2.2 (metric 6) from 192.168.2.2 (192.168.2.2)
Origin IGP, metric 10, localpref 100, valid, internal
rx pathid: 0, tx pathid: 0
Updated on Jul 14 2019 18:01:30 PDT

Step 2 : R1 selects path via R3 (192.168.3.3) as the best path and advertises it towards R4. R4 has the prefix also being learnt via R5. Now, since the prefix from R3 and R5 are both being learnt via same peer AS i.e. R6 (AS300), BGP deterministic-med feature will kick in, compare BGP MED and select R5 as the best path since R5 is having BGP metric as 0.

R4#show ip bgp 100.100.100.100
BGP routing table entry for 100.100.100.100/32, version 17566564
BGP Bestpath: deterministic-med
Paths: (2 available, best #1, table default)
Advertised to update-groups:
1
Refresh Epoch 1
300 1000, (Received from a RR-client), (received & used)
192.168.5.5 (metric 13) from 192.168.5.5 (192.168.5.5)
Origin IGP, metric 0, localpref 100, valid, internal, best
rx pathid: 0, tx pathid: 0x0
Updated on Jul 16 2019 01:13:39 PDT
Refresh Epoch 1
300 1000, (received & used)
192.168.3.3 (metric 6) from 192.168.1.1 (192.168.1.1)
Origin IGP, metric 1, localpref 100, valid, internal
Originator: 192.168.3.3, Cluster list: 0.0.0.1
rx pathid: 0, tx pathid: 0
Updated on Jul 16 2019 05:59:54 PDT

Step 3 : R4 advertises the selected best path to R1. R1 on receiving the update, first compares the new best path with the path received via R3. Path from R5 wins over path from R3. Then the path from R5 is compared with the path received from R2. Since both the paths are learnt from different AS peers, the best path selection goes down the path of comparing IGP metric to the next-hop. It is when, R2 is chosen as the best path. 

R1#show ip bgp 100.100.100.100
BGP routing table entry for 100.100.100.100/32, version 17737194
BGP Bestpath: deterministic-med
Paths: (3 available, best #3, table default)
Advertised to update-groups:
1 2
Refresh Epoch 1
300 1000, (received & used)
192.168.5.5 (metric 14) from 192.168.4.4 (192.168.4.4)
Origin IGP, metric 0, localpref 100, valid, internal
Originator: 192.168.5.5, Cluster list: 0.0.0.2
rx pathid: 0, tx pathid: 0
Updated on Jul 16 2019 06:02:41 PDT
Refresh Epoch 1
300 1000, (Received from a RR-client), (received & used)
192.168.3.3 (metric 5) from 192.168.3.3 (192.168.3.3)
Origin IGP, metric 1, localpref 100, valid, internal
rx pathid: 0, tx pathid: 0
Updated on Jul 16 2019 01:13:40 PDT
Refresh Epoch 1
200 1000, (Received from a RR-client), (received & used)
192.168.2.2 (metric 6) from 192.168.2.2 (192.168.2.2)
Origin IGP, metric 10, localpref 100, valid, internal, best
rx pathid: 0, tx pathid: 0x0
Updated on Jul 16 2019 01:13:34 PDT

Step 4 : R1 advertises the new best path via R2 to R4. R4 on receiving the update, compares the best path with the existing best path which is via R5 and choses R2 (by comparing the IGP metric to next-hop). On selecting R2 as the new best path, R4 sends a withdraw towards R1 for the best path previously selected via R5. 

R4#show ip bgp 100.100.100.100
BGP routing table entry for 100.100.100.100/32, version 17952181
BGP Bestpath: deterministic-med
Paths: (2 available, best #1, table default)
Advertised to update-groups:
2
Refresh Epoch 1
200 1000, (received & used)
192.168.2.2 (metric 7) from 192.168.1.1 (192.168.1.1)
Origin IGP, metric 10, localpref 100, valid, internal, best
Originator: 192.168.2.2, Cluster list: 0.0.0.1
rx pathid: 0, tx pathid: 0x0
Updated on Jul 16 2019 06:06:11 PDT
Refresh Epoch 1
300 1000, (Received from a RR-client), (received & used)
192.168.5.5 (metric 13) from 192.168.5.5 (192.168.5.5)
Origin IGP, metric 0, localpref 100, valid, internal
rx pathid: 0, tx pathid: 0
Updated on Jul 16 2019 01:13:39 PDT

Step 5 : On receiving the withdraw from R4, R1 performs the best path calculation again and the path received via R3 is selected best again (check Step 1).

R1#show ip bgp 100.100.100.100
BGP routing table entry for 100.100.100.100/32, version 17905965
BGP Bestpath: deterministic-med
Paths: (2 available, best #1, table default)
Advertised to update-groups:
1 2
Refresh Epoch 1
300 1000, (Received from a RR-client), (received & used)
192.168.3.3 (metric 5) from 192.168.3.3 (192.168.3.3)
Origin IGP, metric 1, localpref 100, valid, internal, best
rx pathid: 0, tx pathid: 0x0
Updated on Jul 16 2019 01:13:40 PDT
Refresh Epoch 1
200 1000, (Received from a RR-client), (received & used)
192.168.2.2 (metric 6) from 192.168.2.2 (192.168.2.2)
Origin IGP, metric 10, localpref 100, valid, internal
rx pathid: 0, tx pathid: 0
Updated on Jul 16 2019 01:13:34 PDT

This scenario leads to a persistent routing loop situation in the network and the uptime of the route is always seen as learnt 0 seconds ago:

R1
R1#show ip route 100.100.100.100
Routing entry for 100.100.100.100/32
Known via "bgp 100", distance 200, metric 1
Tag 300, type internal
Last update from 192.168.3.3 00:00:00 ago
Routing Descriptor Blocks:
* 192.168.3.3, from 192.168.3.3, 00:00:00 ago
Route metric is 1, traffic share count is 1
AS Hops 2
Route tag 300
MPLS label: none
R1#show ip route 100.100.100.100
Routing entry for 100.100.100.100/32
Known via "bgp 100", distance 200, metric 10
Tag 200, type internal
Last update from 192.168.2.2 00:00:00 ago
Routing Descriptor Blocks:
* 192.168.2.2, from 192.168.2.2, 00:00:00 ago
Route metric is 10, traffic share count is 1
AS Hops 2
Route tag 200
MPLS label: none

R4
R4#show ip route 100.100.100.100
Routing entry for 100.100.100.100/32
Known via "bgp 100", distance 200, metric 10
Tag 200, type internal
Last update from 192.168.2.2 00:00:00 ago
Routing Descriptor Blocks:
* 192.168.2.2, from 192.168.1.1, 00:00:00 ago
Route metric is 10, traffic share count is 1
AS Hops 2
Route tag 200
MPLS label: none
R4#show ip route 100.100.100.100
Routing entry for 100.100.100.100/32
Known via "bgp 100", distance 200, metric 0
Tag 300, type internal
Last update from 192.168.5.5 00:00:00 ago
Routing Descriptor Blocks:
* 192.168.5.5, from 192.168.5.5, 00:00:00 ago
Route metric is 0, traffic share count is 1
AS Hops 2
Route tag 300
MPLS label: none

In order to solve this issue, we can either remove bgp deterministic-med config or we can use bgp always-compare-med configuration on both RR routers.

Hope this was useful.

Cheers...!!!

Happy learning

Comments are closed