Monday, January 4, 2010

BGP load balancing with Multi-path and Link Bandwidth Feature



Early Cisco IOS releases did not support load balancing in networks using Border Gateway Protocol (BGP) as their core routing protocol. The early implementations had to rely on static routing entries or dynamic routing protocols to achieve load balancing over Equal cost multi-paths.

Cisco Introduced the BGP Multi-path feature which supports up to 16 paths (both EBGP and IBGP). With the Introduction of Link Bandwidth feature, BGP can also do Unequal cost load balancing just like EIGRP.







There are a few methods to achieve Load balancing with BGP:


  1. EBGP load balancing with EBGP session between loopback interfaces.
  2. Load balancing with parallel EBGP sessions.
 Ivan Pepelnjak has made an excellent article on BGP load balancing which can be found here


For multiple paths to the same destination to be considered as multipaths, the following criteria must be met:


  • All attributes must be the same. The attributes include weight, local preference, autonomous system path (entire attribute and not just length), origin code, Multi Exit Discriminator (MED), and Interior Gateway Protocol (IGP) distance. 
  • The next hop router for each multipath must be different. 
The BGP process, by default does not mark the prefix coming in from a different Autonomous System as a multipath candidate because of the above mentioned criteria. However, bgp bestpath as-path multipath-relax command will allow us to circumvent the rule. AS paths still have to be te same length, but don't have to be identical.


Router R2 has links with speeds of 2Mbps and 3Mbps to Autonomous systems 40 and 50 respectively, R3 has a link with a speed of 2.5Mbps to AS 50 as shown in the topology. 


R4 and R5 has loop back interfaces with addresses 192.168.10.4/32 and 192.168.10.5/32. The host routes are blocked and an aggregate address 192.168.10.0/24 is advertised to AS 123.


With basic BGP configured, the BGP table of R1 shows results as expected.



R1#sh ip bgp
BGP table version is 1, local router ID is 10.10.123.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete


   Network          Next Hop            Metric LocPrf Weight Path
* i192.168.10.0     10.10.35.5               0    100      0 50 i
* i                 10.10.25.5               0    100      0 50 i
R1#


R2 is sending the best route 192.168.10.0/24 from AS 50 even though it has learnt the same prefix from AS 40.R3 is sending the only route 192.168.10.0/24 from AS 50.R1 is not populating this route into the routing table because of the inaccessible next-hop.

The Intersting stuff happens when R2 is configured for Multi-pathing.


R2#sh run | sec router bgp
router bgp 123
 no synchronization
 bgp log-neighbor-changes
 bgp bestpath as-path multipath-relax
 neighbor 10.10.24.4 remote-as 40
 neighbor 10.10.25.5 remote-as 50
 neighbor 10.10.123.1 remote-as 123
 maximum-paths 16
 no auto-summary
R2#

R2 now installs the prefix learnt from AS 40 and 50 in the routing table as Equal Cost Multi-paths (ECMP).


R2#sh ip route bgp
B    192.168.10.0/24 [20/0] via 10.10.25.5, 00:03:19
                     [20/0] via 10.10.24.4, 00:03:19
R2#


R2#sh ip bgp 192.168.10.0
BGP routing table entry for 192.168.10.0/24, version 2
Paths: (2 available, best #2, table Default-IP-Routing-Table)
Multipath: eBGP
Flag: 0x820
  Advertised to update-groups:
     1          2
  50, (aggregated by 50 192.168.10.5)
    10.10.25.5 from 10.10.25.5 (192.168.10.5)
      Origin IGP, metric 0, localpref 100, valid, external, atomic-aggregate, multipath
  40, (aggregated by 40 192.168.10.4)
    10.10.24.4 from 10.10.24.4 (192.168.10.4)
      Origin IGP, metric 0, localpref 100, valid, external, atomic-aggregate, multipath, best
R2#

R2 now sends the 192.168.10.0/24 to R1 from AS 40 but it also updates the next hop attribute to self.Both links must be up for this correction to happen. If one of the links fail, the original next-hop is replaced.It is safe to always use the next-hop-self command to avoid loss of connectivity due to a single link failure.

R1 installs the 192.168.10.0/24 prefix in its routing table with next hop as 10.10.123.2.

R1#sh ip bgp 192.168.10.0
BGP routing table entry for 192.168.10.0/24, version 4
Paths: (2 available, best #1, table Default-IP-Routing-Table)
  Not advertised to any peer
  40, (aggregated by 40 192.168.10.4)
    10.10.123.2 from 10.10.123.2 (10.10.123.2)
      Origin IGP, metric 0, localpref 100, valid, internal, atomic-aggregate, best
  50, (aggregated by 50 192.168.10.5)
    10.10.35.5 (inaccessible) from 10.10.123.3 (10.10.123.3)
      Origin IGP, metric 0, localpref 100, valid, internal, atomic-aggregate
R1#


Now, let us configure the Unequal cost load balancing using the link bandwidth feature on R1, R2 and R3. R1 is capable of seeing the link bandwidth sent by R2 and R3 with bgp dmz-link command.
R2 and R3 uses BGP extended community to send the link bandwidth attributes to R1.

R1#sh run | sec router bgp
router bgp 123
 no synchronization
 bgp log-neighbor-changes
 bgp bestpath as-path multipath-relax
 bgp dmzlink-bw
 neighbor 10.10.123.2 remote-as 123
 neighbor 10.10.123.3 remote-as 123
 maximum-paths ibgp 16
 no auto-summary
R1#

R2#sh run | sec router bgp
router bgp 123
 bgp log-neighbor-changes
 bgp bestpath as-path multipath-relax
 neighbor 10.10.24.4 remote-as 40
 neighbor 10.10.25.5 remote-as 50
 neighbor 10.10.123.1 remote-as 123
 maximum-paths 16
 !
 address-family ipv4
 neighbor 10.10.24.4 activate
 
neighbor 10.10.24.4 dmzlink-bw
 neighbor 10.10.25.5 activate
 
neighbor 10.10.25.5 dmzlink-bw
 neighbor 10.10.123.1 activate
 
neighbor 10.10.123.1 send-community both
 
maximum-paths 16
 no auto-summary
 no synchronization
 
bgp dmzlink-bw
 exit-address-family
R2#


R3#sh run | sec router bgp
router bgp 123
 bgp log-neighbor-changes
 neighbor 10.10.35.5 remote-as 50
 neighbor 10.10.123.1 remote-as 123
 !
 address-family ipv4
 neighbor 10.10.35.5 activate
 
neighbor 10.10.35.5 dmzlink-bw
 neighbor 10.10.123.1 activate
 
neighbor 10.10.123.1 send-community both
 no auto-summary
 no synchronization
 
bgp dmzlink-bw
 exit-address-family
R3#





R2 is now able to see the outgoing link bandwidth.





R2#sh ip bgp 192.168.10.0
BGP routing table entry for 192.168.10.0/24, version 2
Paths: (2 available, best #1, table Default-IP-Routing-Table)
Multipath: eBGP
  Advertised to update-groups:
     1          2
  40, (aggregated by 40 192.168.10.4)
    10.10.24.4 from 10.10.24.4 (192.168.10.4)
      Origin IGP, metric 0, localpref 100, valid, external, atomic-aggregate, multipath, best
      DMZ-Link Bw 250 kbytes
  50, (aggregated by 50 192.168.10.5)
    10.10.25.5 from 10.10.25.5 (192.168.10.5)
      Origin IGP, metric 0, localpref 100, valid, external, atomic-aggregate, multipath
      DMZ-Link Bw 375 kbytes
R2#





R1 will now see the aggregate Bandwidth 625kbytes (375 + 250) from R1 which is 5Mbps and
312kbytes from R3 which is 2.5Mbps. So tha traffic share is 2:1 (5Mbps:2.5Mbps). R1 was able to install the route from R3 after inserting a static host route to 10.10.35.5 via 10.10.123.3.


R1#sh ip bgp 192.168.10.0
BGP routing table entry for 192.168.10.0/24, version 4
Paths: (2 available, best #2, table Default-IP-Routing-Table)
Multipath: iBGP
  Not advertised to any peer
  50, (aggregated by 50 192.168.10.5)
    10.10.35.5 from 10.10.123.3 (10.10.123.3)
      Origin IGP, metric 0, localpref 100, valid, internal, atomic-aggregate, multipath
      DMZ-Link Bw 312 kbytes
  40, (aggregated by 40 192.168.10.4)
    10.10.123.2 from 10.10.123.2 (10.10.123.2)
      Origin IGP, metric 0, localpref 100, valid, internal, atomic-aggregate, multipath, best
      DMZ-Link Bw 625 kbytes
R1#








R1#sh ip route 192.168.10.0
Routing entry for 192.168.10.0/24
  Known via "bgp 123", distance 200, metric 0
  Tag 40, type internal
  Last update from 10.10.123.2 00:08:15 ago
  Routing Descriptor Blocks:
    10.10.123.2, from 10.10.123.2, 00:08:15 ago
      Route metric is 0, traffic share count is 2
      AS Hops 1
      Route tag 40
  * 10.10.35.5, from 10.10.123.3, 00:08:15 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 40


R1#

Cisco Express Forwarding (CEF) is a prerequisite for both BGP multi-path and link bandwidth features. Both these features can only be configured for IPv4 address family.
BGP Multipath loadsharing has these restrictions and link bandwidth has these restrictions