Sep 02

Abstract

This publication briefly covers the use of 3rd party next-hops in OSPF, RIP, EIGRP and BGP routing protocols. Common concepts are introduced and protocol-specific implementations are discussed. Basic understanding of the routing protocol function is required before reading this blog post.

Overview

Third-party next-hop concept appears only to distance vector protocol, or in the parts of the link-state protocols that exhibit distance-vector behavior. The idea is that a distance-vector update carries explicit next-hop value, which is used by receiving side, as opposed to the “implicit” next-hop calculated as the sending router’s address – the source address in the IP header carrying the routing update. Such “explicit” next-hop is called “third-party” next-hop IP address, allowing for pointing to a different next-hop, other than advertising router. Intitively, this is only possible if the advertising and receiving router are on a shared segment, but the “shared segment” concept could be generalized and abstracted. Every popular distance-vector protocols support third party next-hop – RIPv2, EIGRP, OSPF and BGP all carry explicit next-hop value. Look at the figure below – it illustrates the situation where two different distance-vector protocols are running on the shared segment, but none of them runs on all routers attached to the segment. The protocols “overlap” at a “pivotal” router and redistribution is used to provide inter-protocol route exchange.

third-party-nh-generic

Per the default distance-vector protocol behavior, traffic from one routing domain going into another has cross the “pivotal” router, the router where the two domains overlap (R3 in our case) – as opposed to going directly to the closes next-hop on the shared segment. The reason for this is that there is no direct “native” update exchange between the hops running different routing protocols. In situations like this, it is beneficial to rewrite the next-hop IP address to point toward the “optimum” exit point, using the “pivotal” router’s knowledge of both routing protocols.

OSPF is somewhat special with respect to the 3rd party next-hop implementation. It supports third-party next-hop in Type-5/7 LSAs (External Routing Information LSA and NSSA External LSA). These LSAs are processed in “distance-vector manner” by every receiving router. By default, the LSA is assumed to advertise the external prefix “connected” to the advertising router. However, if the FA is non-zero, the address in this field is used to calculate the forwarding information, as opposed to default forwarding toward the advertising router. Forwarding Address is always present in Type-7 LSAs, for the reason illustrated on the figure below:

third-party-nh-ospf-nssa-fa

Since there could be multiple ABRs in NSSA area, only one is elected to perform 7-to-5 LSA translation – otherwise the routing information will loop back in the area, unless manual filtering implemented in the ABRs (which is prone to errors). Translating ABR is elected based on the highest Router-ID, and may not be on the optimum path toward the advertising ASBR. Therefore, the forwarding address should prompt the more optimum path, based on the inter-area routing information.

EIGRP

We start with the scenario where we redistribute RIP into EIGRP.

third-party-nh-rip2eigrp

Notice that EIGRP will not insert the third-party next-hop until you apply the command no ip next-hop-self eigrp on R3’s connection to the shared segment. Look at the routing table output prior to applying the no ip next-hop-self eigrp command.

R1#show  ip route eigrp 
     140.1.0.0/16 is variably subnetted, 2 subnets, 2 masks
D EX    140.1.2.2/32
           [170/2560002816] via 140.1.123.3, 00:00:27, FastEthernet0/0

After the command has been applied to R3’s interface:

R1#show  ip route eigrp
     140.1.0.0/16 is variably subnetted, 2 subnets, 2 masks
D EX    140.1.2.2/32
           [170/2560002816] via 140.1.123.2, 00:00:04, FastEthernet0/0

The same behavior is observed when redistributing OSPF into EIGRP, but not when redistributing BGP. For some reason, BGP’s next-hop is not copied into EIGRP, e.g. in the example below, EIGRP will NOT insert the BGP’s next-hop into updates. Notice that you may enable or disable the third-party next-hop behavior in EIGRP using the interface-level command ip next-hop-self eigrp.

RIP

RIP passes the third-party next-hop from OSPF, BGP or EIGRP. For instance, assume EIGRP redistribution into RIP. You have to turn on the no ip split-horizon on R3’s Ethernet connection to get this to work:

third-party-nh-eigrp2rip

R2#show ip route rip 
     140.1.0.0/16 is variably subnetted, 3 subnets, 2 masks
R       140.1.1.1/32 [120/1] via 140.1.123.1, 00:00:17, FastEthernet0/0

Notice the following RIP debugging output, which lists the third-party next-hop:

RIP: received v2 update from 140.1.123.3 on FastEthernet0/0
     140.1.1.1/32 via 140.1.123.1 in 1 hops
     140.1.123.0/24 via 0.0.0.0 in 1 hops

Surprisingly, there is NO need to enable the command no ip split-horizon on the interface when redistributing BGP or OSPF routes into RIP. Seem like only EIGRP to RIP redistribution requires that. Keep in mind, however, that split-horizon is OFF by default on physical frame-relay interfaces. Here is a sample output of redistributing BGP into RIP using the third-party next-hop:

R3#show ip route bgp 
     140.1.0.0/16 is variably subnetted, 3 subnets, 2 masks
B       140.1.2.2/32 [20/0] via 140.1.123.2, 00:22:13
R3#

R1#show ip route rip 
     140.1.0.0/16 is variably subnetted, 3 subnets, 2 masks
R       140.1.2.2/32 [120/1] via 140.1.123.2, 00:00:09, FastEthernet0/0

RIP’s third-party next-hop behavior is fully automatic. You cannot disable or enable it, like you do in EIGRP.

OSPF

Similarly to RIP, OSPF has no problems picking up the third-party next-hop from BGP, EIGRP or RIP. Here is how it would look like (guess which protocol is redistributed into OSPF, based solely on the commands output):

R1#sh ip route ospf 
     140.1.0.0/16 is variably subnetted, 3 subnets, 2 masks
O E2    140.1.2.2/32 [110/1] via 140.1.123.2, 00:34:59, FastEthernet0/0

R1#show ip ospf database external 

            OSPF Router with ID (140.1.1.1) (Process ID 1)

                Type-5 AS External Link States

  Routing Bit Set on this LSA
  LS age: 131
  Options: (No TOS-capability, DC)
  LS Type: AS External Link
  Link State ID: 140.1.2.2 (External Network Number )
  Advertising Router: 140.1.123.3
  LS Seq Number: 80000002
  Checksum: 0xF749
  Length: 36
  Network Mask: /32
        Metric Type: 2 (Larger than any link state path)
        TOS: 0
        Metric: 1
        Forward Address: 140.1.123.2
        External Route Tag: 200

If you’re still guessing, the external protocol is BGP, as could have been seen observing the automatic External Route Tag – OSPF set’s it to the last AS# found in the AS_PATH.

third-party-nh-bgp2ospf

There are special conditions to be met by OSPF for the FA address to be used. First, the interface where the third party next-hop resides should be advertised into OSPF using the network command. Secondly, this interface should not be passive in OSPF and should not have network type point-to-point or point-to-multipoint. Violating any of these conditions will stop OSPF from using the FA in type-5 LSA created for external routes. Violating any of these conditions prevents third-party next-hop installation in the external LSAs.

OSPF is special in one other respect. Distance vector-protocols such as RIP or EIGRP modify the next-hop as soon as they pass the routing information to other devices. That is, the third party next-hop is not maintained through the RIP or EIGRP domain. Contrary to these, OSPF LSAs are flooded within their scope with the FA unmodified. This creates interesting problem: if the FA address is not reachable in the receiving router’s routing table, the external information found in type 7/5 LSA is not used. This situation is discussed in the blog post “OSPF Filtering using FA Address”.

BGP

When you redistribute any protocol into BGP, the system correctly sets the third-party next-hop in the local BGP table. Look at the diagram below, where EIGRP prefixes are being redistributed into BGP AS 300:

third-party-nh-eigrp2bgp

R3’s BGP process installs R1 Loopback0 prefix into the BGP table with the next-hop value of R1’s address, not “0.0.0.0” like it would be for locally advertised routes. You will observe the same behavior if you inject EIGRP prefixes into BGP using the network command.

R3#sh ip bgp
BGP table version is 9, local router ID is 140.1.123.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 140.1.1.1/32     140.1.123.1         156160         32768 ?

Furthermore, BGP is supposed to change the next-hop to self when advertising prefixes over eBGP peering sessions. However, when all peers share the same segment, the prefixes re-advertised over the shared segment do not have their next-hop changed. See the diagram below:

third-pary-nh-bgp2bgp

Here R1 advertises prefix 140.1.1.1/24 to R3 and R3 re-advertises it back to R2 over the same segment. Unless non-physical interfaces are used to form the BGP sessions (e.g. Loopbacks), the next-hop received from R1 is not changed when passing it down to R2. This implements the default third-party next-hop preservation over eBGP sessions. Look at the sample output for the configuration illustrated above: R1 receives R2’s prefix with unmodified next-hop.

R1#show ip bgp 
BGP table version is 3, local router ID is 140.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 140.1.1.1/32     0.0.0.0                  0         32768 i
*> 140.1.2.2/32     140.1.123.2                            0 300 200 i

There is a way to disable this default behavior in BGP. A logical assumption would be that using the command neighbor X.X.X.X next-hop-self would work, and it does indeed, in the recent IOS versions. The older IOS, such as 12.2T did not have this command working for eBGP sessions, and your option would have been using a route-map with set ip next-hop command. The route-map method may still be handy, if you want insert totally “bogus” IP next-hop from the shared segment – receiving BGP speaker will accept any IP address that is on the same segment. That is not something you would do in the production environment too often, but definitely an interesting idea for lab practicing. One good use in production is changing the BGP next-hop to an HSRP virtual IP address, to provide physical BGP speaker redundancy. Here is a sample code for setting an explicit next-hop in BGP update:

router bgp 300
 neighbor 140.1.123.1 remote-as 100
 neighbor 140.1.123.1 route-map BGP_NEXT_HOP out
!
route-map BGP_NEXT_HOP permit 10
 set ip next-hop 140.1.123.100

Summary

All popular distance-vector protocols support third-party next-hop insertion. This mechanism is useful on multi-access segments, in situations when you want pass optimum path information between routers belonging to different routing protocols. We illustrated that RIP implements this function automatically, and does not allow any tuning. On the other hand, EIGRP supports third-party next-hop passing from any protocol, other than BGP, and you may turn this function on/off on per-interface basis. Furthermore, OSPF’s special feature is propagation of the third-party next-hop within an area/autonomous system, unlike the distance-vector protocols that reset the next-hop at every hop (considering AS a being a “single-hop” for BGP). Thanks to that feature, OSPF offers interesting possibility to filter external routing information by blocking FA prefix from the routing tables. Finally, BGP gives most flexibility when it comes to the IP next-hop manipulation, allowing for changing it to any value.

Further Reading

Common Routing Problem with OSPF Forwarding Address
OSPF Prefix Filtering Using Forwarding Address
BGP Redundancy using HSRP

Tagged with:
Aug 21

Our BGP class is coming up!  This class is for learners who are pursuing the CCIP track, or simply want to really master BGP.  I have been working through the slides, examples  and demos that we’ll use in class, and it is going to be excellent.  :) If you can’t make the live event, we are recording it, so it will be available as a class on demand, after the live event.    More information, can be found by clicking here.

One of the common questions that comes up is “Why does the router choose THAT route?

We all know, (or at least after reading the list below, we will know), that BGP uses the following order, to determine the “best” path.

bgp bestpath

So now for the question.   Take a look at the partial output of the show command below:

bgp bestpath

Regarding the 2.2.2.0/24 network, why did this router select the 192.168.68.8 next hop route, over the one just below it?

Post your ideas, and we will have a drawing next week, before the BGP class begins.   We’ll give 1 lucky winner some rack tokens for our preferred rack vendor, Graded Labs.   Everyone who comments, will be entered into the drawing.  I will update the post with the lucky winner.

Thanks for your ideas, and happy learning,

Keith Barker

Keith

Thank you to all who responded.  eBGP is preferred over iBGP, and that is what it came down to.

The winner of the graded labs tokens is Jon!  Congratulations.

Tagged with:
Aug 16

Last week we wrapped up the MPLS bootcamp, and it was a blast!   A big shout out to all the students who attended,  as well as to many of the INE staff who stopped by (you know who you are :) ).    Thank you all.

Here is the topology we used for the class, as we built the network, step by step.

MPLS-class blog

The class was organized and delivered in 30 specific lessons. Here is the “overview” slide from class:

MPLS Journey Statement

One of the important items we discussed was troubleshooting.   When we understand all the components of Layer3 VPNs, the troubleshooting is easy.   Here are the steps:

  • Can PE see CE’s routes?
  • Are VPN routes going into MP-BGP?  (The Export)
  • Are remote PEs seeing the VPN routes?
  • Are remote PEs inserting the VPN routes into the correct local VRF? (The Import)
  • Are remote PEs, advertising these to remote CEs?
  • Are the remote CEs seeing the routes?

We had lots of fun, and included wireshark protocol analysis, so we could see and verify what we were learning.   Here is one example, of a BGP updated from a downstream iBGP neighbor which includes the VPN label:

VPN Label

If you missed the class, but still want to benefit from it, we have recorded all 30 sessions, and it is available as an on-demand version of the class.

Next week, the BGP bootcamp is running, so if you need to  brush up on BGP, we will be covering the following topics, also  in 30, easy to digest lessons:

  • Monitoring and Troubleshooting BGP
  • Multi-Homed BGP Networks
  • AS-Path Filters
  • Prefix-List Filters
  • Outbound Route Filtering
  • Route-Maps as BGP Filters
  • BGP Path Attributes
  • BGP Local Preference
  • BGP Multi-Exit-Discriminator (MED)
  • BGP Communities
  • BGP Customer Multi-Homed to a Single Service Provider
  • BGP Customer Multi-Homed to Multiple Service Providers
  • Transit Autonomous System Functions
  • Packet Forwarding in Transit Autonomous Systems
  • Monitoring and Troubleshooting IBGP in Transit AS
  • Network Design with Route Reflectors
  • Limiting the Number of Prefixes Received from a BGP Neighbor
  • AS-Path Prepending
  • BGP Peer Group
  • BGP Route Flap Dampening
  • Troubleshooting Routing Issues
  • Scaling BGP

I look forward to seeing you in class!

Best wishes in all of your learning,

Keith

Keith Barker

Tagged with:
Jul 19

Can you solve this puzzle?

R2, R3 and R4 create the service provider network, with MPLS on all three routers, and iBGP at the PE routers.  R1 and R5 are the CE routers.

R2, prefers the BGP next hop of 4.4.4.4 for network 5.5.5.5 (R5 loopback). R4, at 4.4.4.4 is an iBGP neighbor.

R2#show ip route vrf v | inc 5.5.5.0
B       5.5.5.0 [200/409600] via 4.4.4.4, 00:06:47

Is R2 preferring an iBGP learned route, which has an AD of 200, over a EIGRP route, which would have an AD of 90?

Can you identify why the routing for 5.5.5.0 on the VRF of R2 is using BGP instead of EIGRP?

EIGRP PATH with MPLS

Below are the relevant portions of the configuration, which also can serve as a great review of how to configure MPLS VPNs.
R1, CE router:

R1#show run
interface Loopback0
 ip address 1.1.1.1 255.255.255.0
!
interface FastEthernet0/0
 ip address 10.1.12.1 255.255.255.0
 duplex auto
 speed auto
!
interface Serial0/0
 ip address 10.1.215.1 255.255.255.0
!

router eigrp 1
 network 0.0.0.0
 no auto-summary

R2, PE Router:

R2#show run
!
ip vrf v
 rd 1:1
 route-target export 1:1
 route-target import 1:1
!
!
interface Loopback0
 ip address 2.2.2.2 255.255.255.255
 ip ospf 1 area 0
!
interface FastEthernet0/0
 ip vrf forwarding v
 ip address 10.1.12.2 255.255.255.0
!
interface FastEthernet0/1
 ip address 10.1.23.2 255.255.255.0
 ip ospf 1 area 0
 mpls ip
!
router eigrp 1
 no auto-summary
 !
 address-family ipv4 vrf v
  redistribute bgp 234 metric 1 10000 1 1 1
  network 10.1.12.2 0.0.0.0
  auto-summary
  autonomous-system 1
 exit-address-family
!
router ospf 1
 log-adjacency-changes
!
router bgp 234
 no bgp default ipv4-unicast
 bgp log-neighbor-changes
 neighbor 4.4.4.4 remote-as 234
 neighbor 4.4.4.4 update-source Loopback0
 !
 address-family vpnv4
  neighbor 4.4.4.4 activate
  neighbor 4.4.4.4 send-community extended
 exit-address-family
 !
 address-family ipv4 vrf v
  redistribute eigrp 1
  no synchronization
 exit-address-family
!
ip forward-protocol nd
!

R3, P router:

R3#show run

interface Loopback0
 ip address 3.3.3.3 255.255.255.255
!
interface FastEthernet0/0
 ip address 10.1.34.3 255.255.255.0
 mpls ip
!
interface FastEthernet0/1
 ip address 10.1.23.3 255.255.255.0
 mpls ip
!
router ospf 1
 log-adjacency-changes
 network 0.0.0.0 255.255.255.255 area 0
!

R4: PE Router

R4#show run
!
ip vrf v
 rd 1:1
 route-target export 1:1
 route-target import 1:1
!
!
interface Loopback0
 ip address 4.4.4.4 255.255.255.255
 ip ospf 1 area 0
!
interface FastEthernet0/0
 ip address 10.1.34.4 255.255.255.0
 ip ospf 1 area 0
 mpls ip
!
interface FastEthernet0/1
 ip vrf forwarding v
 ip address 10.1.45.4 255.255.255.0
!
router eigrp 1
 no auto-summary
 !
 address-family ipv4 vrf v
  redistribute bgp 234 metric 1 1 1 1 1
  network 10.1.45.4 0.0.0.0
  auto-summary
  autonomous-system 1
 exit-address-family
!
router ospf 1
 log-adjacency-changes
!
router bgp 234
 no bgp default ipv4-unicast
 bgp log-neighbor-changes
 neighbor 2.2.2.2 remote-as 234
 neighbor 2.2.2.2 update-source Loopback0
 !
 address-family vpnv4
  neighbor 2.2.2.2 activate
  neighbor 2.2.2.2 send-community extended
 exit-address-family
 !
 address-family ipv4 vrf v
  redistribute eigrp 1
  no synchronization
 exit-address-family

R5: CE Router

R5#show run
!
interface Loopback0
 ip address 5.5.5.5 255.255.255.0
!
interface Serial0/0
 ip address 10.1.215.5 255.255.255.0
 clock rate 64000
!
interface FastEthernet0/1
 ip address 10.1.45.5 255.255.255.0
!
router eigrp 1
 network 0.0.0.0
 no auto-summary
!

Now for a couple show commands on R1:

R1#show ip route eigrp
     5.0.0.0/24 is subnetted, 1 subnets
D       5.5.5.0 [90/435200] via 10.1.12.2, 00:19:08, FastEthernet0/0
     10.0.0.0/24 is subnetted, 3 subnets
D       10.1.45.0 [90/307200] via 10.1.12.2, 00:19:08, FastEthernet0/0
R1#

R1#show ip eigrp topology
IP-EIGRP Topology Table for AS(1)/ID(10.1.215.1)

Codes: P - Passive, A - Active, U - Update, Q - Query, R - Reply,
       r - reply Status, s - sia Status

P 1.1.1.0/24, 1 successors, FD is 128256
        via Connected, Loopback0
P 5.5.5.0/24, 1 successors, FD is 435200
        via 10.1.12.2 (435200/409600), FastEthernet0/0
        via 10.1.215.5 (2297856/128256), Serial0/0
P 10.1.12.0/24, 1 successors, FD is 281600
        via Connected, FastEthernet0/0
P 10.1.45.0/24, 1 successors, FD is 307200
        via 10.1.12.2 (307200/281600), FastEthernet0/0
        via 10.1.215.5 (2195456/281600), Serial0/0
P 10.1.215.0/24, 1 successors, FD is 2169856
        via Connected, Serial0/0
R1#

And some on R2, the PE router:

R2#show ip route vrf v

Routing Table: v
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route

Gateway of last resort is not set

     1.0.0.0/24 is subnetted, 1 subnets
D       1.1.1.0 [90/409600] via 10.1.12.1, 00:31:48, FastEthernet0/0
     5.0.0.0/24 is subnetted, 1 subnets
B       5.5.5.0 [200/409600] via 4.4.4.4, 00:02:34
     10.0.0.0/24 is subnetted, 3 subnets
C       10.1.12.0 is directly connected, FastEthernet0/0
B       10.1.45.0 [200/0] via 4.4.4.4, 00:31:48
D       10.1.215.0 [90/2195456] via 10.1.12.1, 00:31:21, FastEthernet0/0

R2#show ip eigrp vrf v topology
IP-EIGRP Topology Table for AS(1)/ID(10.1.12.2) Routing Table: v

Codes: P - Passive, A - Active, U - Update, Q - Query, R - Reply,
       r - reply Status, s - sia Status

P 1.1.1.0/24, 1 successors, FD is 409600
        via 10.1.12.1 (409600/128256), FastEthernet0/0
P 5.5.5.0/24, 1 successors, FD is 409600
        via VPNv4 Sourced (409600/0)
P 10.1.12.0/24, 1 successors, FD is 281600
        via Connected, FastEthernet0/0
P 10.1.45.0/24, 1 successors, FD is 281600
        via VPNv4 Sourced (281600/0)
P 10.1.215.0/24, 1 successors, FD is 2195456
        via 10.1.12.1 (2195456/2169856), FastEthernet0/0
R2#

Take a minute to post your thoughts, and as always, happy studies.

Keith
Keith

It has been a few days, and we have received lots of great ideas.   Thank you.

When R4 receives the routes in VRF v, the EIGRP metrics are copied into extended BGP attributes, and include the information for metric, AS, route-type and more.  The iBGP updates from R4 to R2 contain all those attributes.   When R2 receives the updates, if the route type is internal (from EIGRP attributes) and the source EIGRP AS matches the local EIGRP AS we are importing to, it will then be up to the  metric to determine the best path.

If we decreased the bandwidth statement on R4 Fa0/1, or used an offset list (2,000,000 more should do the trick) on R5 out Fa0/1 (towards R4), the increase in metric would cause R2 to prefer the path through R1 for 5.5.5.0/24 instead of using the MPLS backbone.

BGP updates that contain the cost community attribute will use the EIGRP AD instead of the iBGP AD of 200 to compare routes on metric alone. In that light, another option, would be to tell R2 to ignore cost-community, with the BGP router command:

bgp bestpath cost-community ignore

Let’s take a look at the results.

Here is the baseline for before any changes:

R2#show ip route vrf v | inc 5.5.5
B       5.5.5.0 [200/409600] via 4.4.4.4, 00:02:29
R2#show ip bgp vpnv4 all 5.5.5.0
BGP routing table entry for 1:1:5.5.5.0/24, version 8
Paths: (1 available, best #1, table v)
Flag: 0x820
  Not advertised to any peer
  Local
    4.4.4.4 (metric 21) from 4.4.4.4 (4.4.4.4)
      Origin incomplete, metric 409600, localpref 100, valid, internal, best
      Extended Community: RT:1:1 Cost:pre-bestpath:128:409600 0x8800:32768:0
        0x8801:1:153600 0x8802:65281:256000 0x8803:65281:1500
      mpls labels in/out nolabel/19
R2#

Now we will remove the default behavior

R2(config)#router bgp 234
R2(config-router)#bgp bestpath cost-community ignore

Cleared BGP sessions and routing tables, and waited a minute before the following show commands:

R2#show ip route vrf v | inc 5.5.5
D       5.5.5.0 [90/2323456] via 10.1.12.1, 00:00:08, FastEthernet0/0
R2#show ip bgp vpnv4 all 5.5.5.0
BGP routing table entry for 1:1:5.5.5.0/24, version 8
Paths: (2 available, best #2, table v)
Flag: 0x820
  Advertised to update-groups:
        1
  Local
    4.4.4.4 (metric 21) from 4.4.4.4 (4.4.4.4)
      Origin incomplete, metric 409600, localpref 100, valid, internal
      Extended Community: RT:1:1 Cost:pre-bestpath:128:409600 0x8800:32768:0
        0x8801:1:153600 0x8802:65281:256000 0x8803:65281:1500
      mpls labels in/out 20/19
  Local
    10.1.12.1 from 0.0.0.0 (2.2.2.2)
      Origin incomplete, metric 2323456, localpref 100, weight 32768, valid, sourced, best
      Extended Community: RT:1:1
        Cost:pre-bestpath:128:2323456 (default-2145160191) 0x8800:32768:0
        0x8801:1:665600 0x8802:65282:1657856 0x8803:65281:1500
      mpls labels in/out 20/nolabel
R2#

After setting it back to defaults, we could then try an offset list on R5 advertising to R4:

R5(config)#router eigrp 1
R5(config-router)#offset-list 0 out 2000000 fastEthernet 0/1

Cleared BGP sessions and routing tables, and waited a minute before the following show commands:

R2#show ip route vrf v | inc 5.5.5
D       5.5.5.0 [90/2323456] via 10.1.12.1, 00:06:28, FastEthernet0/0
R2#show ip bgp vpnv4 all 5.5.5.0
BGP routing table entry for 1:1:5.5.5.0/24, version 12
Paths: (1 available, best #1, table v)
Flag: 0x820
  Advertised to update-groups:
        1
  Local
    10.1.12.1 from 0.0.0.0 (2.2.2.2)
      Origin incomplete, metric 2323456, localpref 100, weight 32768, valid, sourced, best
      Extended Community: RT:1:1
        Cost:pre-bestpath:128:2323456 (default-2145160191) 0x8800:32768:0
        0x8801:1:665600 0x8802:65282:1657856 0x8803:65281:1500
      mpls labels in/out 31/nolabel
R2#

After resetting all that, implementing the following on R4, and then clearing BGP and routing, we issue the show commands again.

R4(config)#int fa 0/1
R4(config-if)#bandwidth 100

R2#show ip route vrf v | inc 5.5.5
D       5.5.5.0 [90/2323456] via 10.1.12.1, 00:00:05, FastEthernet0/0
R2#show ip bgp vpnv4 all 5.5.5.0
BGP routing table entry for 1:1:5.5.5.0/24, version 20
Paths: (1 available, best #1, table v)
Flag: 0x820
  Advertised to update-groups:
        1
  Local
    10.1.12.1 from 0.0.0.0 (2.2.2.2)
      Origin incomplete, metric 2323456, localpref 100, weight 32768, valid, sourced, best
      Extended Community: RT:1:1
        Cost:pre-bestpath:128:2323456 (default-2145160191) 0x8800:32768:0
        0x8801:1:665600 0x8802:65282:1657856 0x8803:65281:1500
      mpls labels in/out 23/nolabel
R2#

Thanks again to all who contributed. I encourage all RS candidates to lab this up, as well as practice MPLS with OSPF at the CEs.

Marcel posted a comment, reminding us of an excellent document written by Petr, on this topic and more. The original post from Petr which includes the link to free .PDF for this document may be found by clicking here. Thanks Marcel!

Tagged with:
May 25

It isn’t my fault, they configured it that way before I got here! That was the entry level technician’s story Monday morning, and he was sticking to it.  :)

Here is the rest of the story.   Over the weekend, some testing had been done regarding a proposed BGP configuration.   The objective was simple, R1 and R3 needed to ping each others loobacks at 1.1.1.1 and 3.3.3.3 respectively, with those 2 networks, being carried by BGP.  R2 is performing NAT.    The topology diagram looks like this:

3 routers in a row-NO-user

The ping between loopbacks didn’t work, but R1 and R3 had these console messages:

R1#
%TCP-6-BADAUTH: No MD5 digest from 10.0.0.3(179) to 10.0.0.1(28556) (RST)
R1#
%TCP-6-BADAUTH: No MD5 digest from 10.0.0.3(179) to 10.0.0.1(28556) (RST)
R1#

R3#
%TCP-6-BADAUTH: No MD5 digest from 23.0.0.1(179) to 23.0.0.3(59922) (RST)
R3#
%TCP-6-BADAUTH: No MD5 digest from 23.0.0.1(179) to 23.0.0.3(59922) (RST)
R3#

The senior engineer looked at the configurations for R1, R2 and R3 and found 5 specific items, each of which was independently causing a failure.

Here is the challenge:  Can you find 1 or more of them?

Let us know what your troubleshooting skills can find, and post your comments here on the blog.

Here are the configurations for the 3 routers:

R1#show run
version 12.4
hostname R1
!
interface Loopback0
 ip address 1.1.1.1 255.255.255.0
!
interface FastEthernet0/0
 ip address 10.0.0.1 255.255.255.0
!
router ospf 1
 network 10.0.0.0 0.0.0.255 area 0
!
router bgp 1
 no synchronization
 bgp log-neighbor-changes
 network 1.1.1.1 mask 255.255.255.255
 neighbor 10.0.0.3 remote-as 3
 neighbor 10.0.0.3 password cisco
 no auto-summary
!
end
R1#

R2#show run
version 12.4
hostname R2
!
interface Loopback0
 ip address 2.2.2.2 255.255.255.0
!
interface FastEthernet0/0
 ip address 10.0.0.2 255.255.255.0
 ip nat inside
 ip virtual-reassembly
!
interface FastEthernet0/1
 ip address 23.0.0.2 255.255.255.0
 ip nat outside
 ip virtual-reassembly
!
router ospf 1
 network 2.2.2.2 0.0.0.0 area 0
 network 10.0.0.2 0.0.0.0 area 0
 network 23.0.0.2 0.0.0.0 area 0
!
ip nat inside source static 10.0.0.1 23.0.0.1
ip nat outside source static 23.0.0.3 10.0.0.3
!
end

R3#show run
version 12.4
hostname R3
!
interface Loopback0
 ip address 3.3.3.3 255.255.255.0
!
interface FastEthernet0/1
 ip address 23.0.0.3 255.255.255.0
!
router ospf 1
 log-adjacency-changes
 network 23.0.0.0 0.0.0.255 area 0
!
router bgp 3
 no synchronization
 bgp log-neighbor-changes
 network 3.3.3.3 mask 255.255.255.255
 neighbor 23.0.0.1 remote-as 1
 neighbor 23.0.0.1 password cisco123
 no auto-summary
!
end
R3#

Let us know what you find!

Best wishes,

Keith

Keith

UPDATE:   ANSWERS

Your contributions and input is great.  You ROCK!

I have summarized the 5 specific errors/issues with the configuration, and here they are:

  • R2: NAT isn’t fully baked. Can fix with  “ip nat outside source static 23.0.0.3 10.0.0.3 add-route” (or we could manually add the route as well).
  • R1 & R3: The BGP passwords don’t match, but it doesn’t matter. BGP authentication doesn’t work between NAT’d BGP neighbors, so it would have to be removed. :)
  • R1 & R3: Incorrect network statements for loopback addresses on both BGP routers (incorrect mask)
  • R1 & R3: Ebgp-multihop statements are needed on both neighbors (not directly connected EBGP)
  • R2: R2 doesn’t know how to reach 1.1.1.1 or 3.3.3.3 (non-BGP routing issue)

Again, thanks for the time and effort invested in this solution, and in learning in general.   I appreciate you!

Best wishes,

Keith

Tagged with:
May 03

The purpose of event dampening is reducing the effect of oscillations on routing systems. In general, periodic process that affect the routing system as a whole should have the period no shorter than the system convergence time (relaxation time). Otherwise, the system will never stabilize and will be constantly updating its state. In reality, complex system have multiple periodic processes running at the same time, which results is in harmonic process interference and complex process spectrum. Considering such behavior is outside the scope of this paper. What we want to do, is finding optimal settings to filter high-frequency events from the routing system. In our particular case, events are interface flaps, occurring periodically. We want to make sure that oscillations with period T or less are not reported to the routing system. Here T is found empirically, based on observed/estimated convergence time as suggested above.

Event dampening uses exponential back-off algorithm to suppress event reporting to the upper level protocols. Effectively, every time an interface flaps (goes down, to be accurate) a penalty value of P is added to the interface penalty counter. If at some point the accumulated penalty exceeds the “suppress” value of S, the interface is placed in the suppress state and further link events are not reported to the upper protocol modules. At all time, the interface penalty counter follows exponential decay process based on the formula P(t)=P(0)*2^(-t/H) where H is half-life time setting for the process. As soon as accumulated penalty reaches the lower boundary of R – the reuse value, interface is unsuppressed, and further changes are again reported to the upper level protocols.

optimized-dampening

What we want to find out, is the lowest value of H that suppresses all harmonic oscillation processes with the period of T or lower, but does not suppress longer-period processes. E.g. we want to block all oscillations happening every 5 seconds or more often, but report interface flaps happening every 6 seconds or less often. Look at the figure above and consider that there have been two flap events, separated by time period T. At the moment of the second flap, the suppress condition is: P*2^(-T/H)+P >= S. Here the left part is the penalty accumulated at the moment of the second flap, assuming the initial penalty at the moment of the first flap was zero. From this inequality, we quickly find out that H >= T/log2(P/(S-P)). if we could make P/(S-P)=2, then the formula would be greatly simplified. Per Cisco’s implementation, P (penalty) is fixed to 1000, and by setting S=1500 we get 1000/(1500-1000)=2. Therefore, if we select S=1500, P=1000 then our condition becomes H >= T. Since we are looking for the minimal value of H we can set H=T. Seeded with this values, event dampening filter will reject all oscillating porcesses with the period shorter than T. However, there is one more parameter we are left to find is R – the reuse time.

We may apply the following logic here. Observing no further events since the last flap for the duration of 2xT, we may assume that the periodic process has stopped. Therefore, we may unblock the interface after 2xT seconds. The reuse value could be found by taking the penalty accumulated after the second flap, and further decaying it for 2xT more seconds: (P*2^(-T/H)+P)*2^(-2T/H) <= R. Since we set H=T we quickly find out that R >= 3/8*P = 375. At this point we have all parameters we need to know in order to apply optimal event dampening settings based on the cut-off period for oscillating processes. Here is a sample configuration, for T=10 seconds. Notice the last parameter, know as the maximum suppress time – the maximum time that the interface could be kept in suppress state. Since our goal is to hold the interface suppressed for at least 2xT seconds, the maximum suppress time is twice the half-life value.

R3:
interface FastEthernet 0/0
 dampening  10 375 1500 20

Lastly, a few words on figuring out the convergence time for your network. To being with, we only consider IGP protocols in this discussion. Dampening in BGP is more complicated, due to the scale of the routing system involved. The general consensus nowadays is that using dampening in BGP may result in more harm than good, due to cascading withdrawn messages. Next, for the IGPs, you are generally considered with a single fault domain, which in properly designed network is bounded to one IGP area (or EIGRP query scope zone). Convergence time for a single area depends on the following factors:

  • Area size – impacts routing database sizes, affects LSA/Query propagation time and SPF runtime.
  • Weakest (in terms of CPU/Memory) router in the area – this is the router to complete SPF computations the last.
  • RIB/FIB sizes: a significant amount of time is wasted on updating RIB/FIB tables after IGP re-convergence. Again, depends on the area size

To summarize, the main factor is the area size and the number of links in the area (which normally follows the power law based on the number of nodes). However, knowing this fact does not give us a formula for the convergence time. In most cases, you should rely on empirical evidence to obtain this. Starting with one-two seconds could be reasonable, but you should scale this value by the factor of two or three to account for multiple oscillations that may run in the network concurrently. Still, one again, there is no magical formula for this – this is what network engineers and designers are for!

Tagged with:
Apr 08

One of our students in the INE RS bootcamp today, asked about an OSPF sham-link. I thought it would make a beneficial addition to our blog, and here it is.  Thanks for the request Christian!

Reader’s Digest version: MPLS networks aren’t free. If a customers is using OSPF to peer between the CE and PE routers, and also has an OSPF CE to CE neighborship, the CE’s will prefer the Intra-Area CE to CE routes (sometimes called the “backdoor” route in this situation), instead of using the Inter-Area CE to PE learned routes that use the MPLS network as a transit path. OSPF sham-links correct this behavior.

This blog post walks through the problem and the solution, including the configuration steps to create and verify a sham-link.

To begin, MPLS is set up in the network as shown with R2 and R4 acting as Provider Edge (PE) routers, and MPLS is enabled throughout R2-R3-R4.

R1 and R5 are Customer Edge (CE) routers, and the Serial0/1.15 interfaces of R1 and R5 are temporarily shut down, (this means the backdoor route isn’t in place yet, and at the moment, there is no problem).

mpls-ospf sham

Currently, R1 and R5 see the routes to each others local networks through the VPNv4 MPLS network, and the routes show up as Inter-Area OSPF routes with the PE routers as the next hop.

Let’s do some testing and verification of what is currently in place. Notice that R1 and R5 can see each others Fa0/0 and Fa0/1 connected networks. These routes show up as Inter-Area (IA) routes.

R1#show ip route ospf
10.0.0.0/24 is subnetted, 2 subnets
O IA    10.45.0.0 [110/2] via 10.12.0.2, 00:00:58, FastEthernet0/0
O IA 192.168.1.0/24 [110/3] via 10.12.0.2, 00:00:43, FastEthernet0/0

R5#show ip route ospf
172.16.0.0/24 is subnetted, 1 subnets
O IA    172.16.0.0 [110/3] via 10.45.0.4, 00:01:49, FastEthernet0/1
10.0.0.0/24 is subnetted, 2 subnets
O IA    10.12.0.0 [110/2] via 10.45.0.4, 00:01:49, FastEthernet0/1

Next, we will enable the Serial0/1.15 interfaces of R1 and R5. When we enable these interfaces, R1 and R5 will become neighbors, and see each others routes to the Fa0/0 and Fa0/1 networks as Intra-Area routes. Even though the OSPF cost will be worse via the serial interfaces, take a close look at what happens and which routes end up in the routing table.

R1(config)#int ser 0/1.15
R1(config-subif)#no shut

R5(config)#int ser 0/1.15
R5(config-subif)#no shut

We’ll wait a few moments, to give the network  time to converge, then take a look at the OSPF routes on the CE routers R1 and R5, just as we did earlier, and see if the routes are different.

R1#show ip route ospf
10.0.0.0/24 is subnetted, 3 subnets
O       10.45.0.0 [110/65] via 10.15.0.5, 00:02:52, Serial0/1.15
O    192.168.1.0/24 [110/65] via 10.15.0.5, 00:02:52, Serial0/1.15

R5#show ip route ospf
172.16.0.0/24 is subnetted, 1 subnets
O       172.16.0.0 [110/65] via 10.15.0.1, 00:03:19, Serial0/1.15
10.0.0.0/24 is subnetted, 3 subnets
O       10.12.0.0 [110/65] via 10.15.0.1, 00:03:19, Serial0/1.15

Notice, that the remote customer networks attached to Fa0/0 and Fa0/1 are now reachable via the serial 0/1.15 interface, and they appear as Intra-Area routes. Even though the metric of 65 is worse than before, and using the slower serial link, the routers prefer these routes instead of using the PE learned routes, because Intra-Area routes are preferred over  Inter-Area routes. Now the Service Provider’s MPLS network will only be used as a backup in the event the serial connection fails. (I don’t think they will be providing a price break either). ;)

To train the network to use the MPLS network as the primary transit path, we need to make the remote Ethernet customer networks look like Intra-Area routes via the PE routers, with a better metric than the serial interfaces, so they can be used instead of the slower serial link. We are actually going to pull a fast one, or a “sham”, on OSPF because the MPLS network is really acting as a “superbackbone” for OSPF, and therefore routes between the CEs are indeed Inter-Area by default. To create the illusion of the CEs not being separated by a backbone, we will create an OSPF sham-link. We will create a couple loopback interfaces in the VRFs on both PEs, and make sure those loopbacks are originated and advertised via BGP. We will use those loopbacks as the source/destination of the OSPF sham-link.

Because the sham-link is seen as an Intra-Area link between PE routers (R2 and R4), an OSPF adjacency is created and database exchange takes place across the sham-link. The two PE routers can then flood LSAs between sites from across the MPLS VPN backbone. As a result, the desired Intra-Area routes are created.

Enough chat, lets create this sham-link!

R2(config)#int loop 100
R2(config-if)#ip vrf forwarding Vrf1
R2(config-if)#ip address 11.11.11.2 255.255.255.255
R2(config-if)#router bgp 24
R2(config-router)#address-family ipv4 vrf Vrf1
R2(config-router-af)#network 11.11.11.2 mask 255.255.255.255
R2(config-router-af)#exit
R2(config-router)#router ospf 1 vrf Vrf1
R2(config-router)#area 1 sham-link 11.11.11.2 11.11.11.4 cost 5

R4(config)#int loop 100
R4(config-if)#ip vrf forwarding Vrf1
R4(config-if)#ip address 11.11.11.4 255.255.255.255
R4(config-if)#router bgp 24
R4(config-router)#address-family ipv4 vrf Vrf1
R4(config-router-af)#network 11.11.11.4 mask 255.255.255.255
R4(config-router-af)#exit
R4(config-router)#router ospf 1 vrf Vrf1
R4(config-router)#area 1 sham-link 11.11.11.4 11.11.11.2 cost 5
%OSPF-5-ADJCHG: Process 1, Nbr 10.12.0.2 on OSPF_SL0 from LOADING to FULL, Loading Done

Looks like the sham-link came up.  Let’s take a closer look at the sham link with a show command made just for that purpose.

R4#show ip ospf sham-links
Sham Link OSPF_SL0 to address 11.11.11.2 is up
Area 1 source address 11.11.11.4
Run as demand circuit
DoNotAge LSA allowed. Cost of using 5 State POINT_TO_POINT,
Timer intervals configured, Hello 10, Dead 40, Wait 40,
Hello due in 00:00:06
Adjacency State FULL (Hello suppressed)
Index 2/2, retransmission queue length 0, number of retransmission 0
First 0x0(0)/0x0(0) Next 0x0(0)/0x0(0)
Last retransmission scan length is 0, maximum is 0
Last retransmission scan time is 0 msec, maximum is 0 msec

Looks like it is in place, but is it creating the desired result, of having the CE routers R1 and R5 see the Ethernet remote networks as reachable through the PE routers R2 and R4? Let’s go to R1 and see!

R1#show ip route ospf
10.0.0.0/24 is subnetted, 3 subnets
O       10.45.0.0 [110/7] via 10.12.0.2, 00:06:02, FastEthernet0/0
11.0.0.0/32 is subnetted, 2 subnets
O E2    11.11.11.2 [110/1] via 10.12.0.2, 00:06:43, FastEthernet0/0
O E2    11.11.11.4 [110/1] via 10.12.0.2, 00:06:13, FastEthernet0/0
O    192.168.1.0/24 [110/8] via 10.12.0.2, 00:06:02, FastEthernet0/0

That looks perfect! How about R5?

R5#show ip route ospf
172.16.0.0/24 is subnetted, 1 subnets
O       172.16.0.0 [110/8] via 10.45.0.4, 00:06:27, FastEthernet0/1
10.0.0.0/24 is subnetted, 3 subnets
O       10.12.0.0 [110/7] via 10.45.0.4, 00:06:27, FastEthernet0/1
11.0.0.0/32 is subnetted, 2 subnets
O E2    11.11.11.2 [110/1] via 10.45.0.4, 00:07:05, FastEthernet0/1
O E2    11.11.11.4 [110/1] via 10.45.0.4, 00:06:45, FastEthernet0/1

And just to be sure, a ping to verify connectivity. We will ping the remote Fa0/1 interface of CE router R1 from CE router R5.

R5#ping 172.16.0.1

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.0.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 120/130/148 ms

That’s cool, so we know we have connectivity, and based on the routing table output, we believe it is going through the SP MPLS network. Let’s do one more test to prove that as well. A traceroute.

R5#trace 172.16.0.1

Type escape sequence to abort.
Tracing the route to 172.16.0.1

1 10.45.0.4 48 msec 92 msec 12 msec
2 10.34.0.3 [MPLS: Labels 16/24 Exp 0] 136 msec 180 msec 228 msec
3 10.12.0.2 [MPLS: Label 24 Exp 0] 124 msec 80 msec 88 msec
4 10.12.0.1 112 msec *  176 msec

Tags and all!  I still love it when a plan comes together.   Now our transit traffic is moving through the MPLS network, and the serial 0/1.15 interfaces are available as a backup.

More fun times regarding MPLS, OSPF and MPBGP can be found in our workbooks for RS and SP.

Best wishes, and enjoy the journey!

Keith

Keith

Tagged with:
Apr 06

Having a blast in Chicago with the RS bootcamp students.    Thanks for all the hard work you are doing this week!

A student from a past Reno class, named Michal, asked if I would create a blog post regarding BGP proportional load balancing based on the bandwidth of the links to EBGP peers. It has been on my list of things to do, and here it is. Thanks for the request Michal.

The secret to this trick is to pay attention to the links between directly connected external BGP neighbors, (in this case between R6-R5 and R2-R3), and send the link bandwidth extended community attribute to iBGP peer R1.  It is enabled by entering the bgp dmzlink-bw command and using extended communities to share the information.  To summarize: routes learned from directly connected external neighbor are advertised to IBGP peers including the bandwidth of the external link where the routes were learned, and then the IBGP router (R1) can proportionally load balance between the two paths.

Here is the diagram we will use.

BGP Diagram

We’ll use loobpacks for our IBGP connections, so let’s verify that we have connectivity between loopbacks in AS 123.

R1#ping 6.6.6.6 source loopback 0

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 6.6.6.6, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 16/43/76 ms
R1#
R1#ping 2.2.2.2 source loopback 0

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 16/40/72 ms

Ok, that looks good, so let’s configure R1 to be an IBGP peer with R6 and R2.  The dmzlink-bw feature is implemented as part of the IPv4 address family configuration.

R1(config)#router bgp 126
R1(config-router)#neighbor 6.6.6.6 remote-as 126
R1(config-router)#neighbor 2.2.2.2 remote-as 126
R1(config-router)#neighbor 6.6.6.6 update-source lo0
R1(config-router)#neighbor 2.2.2.2 update-source lo0

R1(config-router)#address-family ipv4
R1(config-router-af)#bgp dmzlink-bw
R1(config-router-af)#neighbor 6.6.6.6 activate
R1(config-router-af)#neighbor 2.2.2.2 activate
R1(config-router-af)#neighbor 6.6.6.6 send-community both
R1(config-router-af)#neighbor 2.2.2.2 send-community both
R1(config-router-af)#maximum-paths ibgp 2
R1(config-router-af)#end

Next, we will configure R6, and R2 to be IBGP neighbors with R1, and EBGP neighbors with R5 and R3 respectively. We are going to manipulate the external interfaces on R6 and R2 to reflect a bandwidth of 6000k and 5000k respectively using the bandwidth command.  BGP can originate the link bandwidth community only for directly connected links to eBGP neighbors.  In our example, this will be originated from R6 and R2.

R6(config)#router bgp 126
R6(config-router)#neighbor 1.1.1.1 remote-as 126
R6(config-router)#neighbor 1.1.1.1 update-source lo0
R6(config-router)#neighbor 10.56.0.5 remote-as 345
R6(config-router)#address-family ipv4
R6(config-router-af)#bgp dmzlink-bw
R6(config-router-af)#neighbor 1.1.1.1 activate
R6(config-router-af)#neighbor 1.1.1.1 next-hop-self
R6(config-router-af)#neighbor 1.1.1.1 send-community both
R6(config-router-af)#neighbor 10.56.0.5 activate
R6(config-router-af)#neighbor 10.56.0.5 dmzlink-bw
R6(config-router-af)#int fa 0/0
R6(config-if)#bandwidth 6000

Now, on to R2, with virtually the same configuration.

R2(config)#router bgp 126
R2(config-router)#neighbor 1.1.1.1 remote-as 126
R2(config-router)#neighbor 1.1.1.1 update-source lo0
R2(config-router)#neighbor 10.23.0.3 remote-as 345
R2(config-router)#address-family ipv4
R2(config-router-af)#bgp dmzlink-bw
R2(config-router-af)#neighbor 1.1.1.1 activate
R2(config-router-af)#neighbor 1.1.1.1 next-hop-self
R2(config-router-af)#neighbor 1.1.1.1 send-community both
R2(config-router-af)#neighbor 10.23.0.3 activate
R2(config-router-af)#neighbor 10.23.0.3 dmzlink-bw
R2(config-router-af)#int ser 0/1.23
R2(config-subif)#bandwidth 5000

Now we will configure R5 and R3 as the EBGP neighbors of R6 and R2 respectively.  These EBGP peers don’t need any special configuration, other than standard BGP.

R5(config)#router bgp 345
R5(config-router)#neighbor 10.56.0.6 remote-as 126
R5(config-router)#neighbor 4.4.4.4 remote-as 345
R5(config-router)#neighbor 4.4.4.4 update-source lo0
R5(config-router)#neighbor 4.4.4.4 next-hop-self

R3(config)#router bgp 345
R3(config-router)#neighbor 10.23.0.2 remote-as 126
R3(config-router)#neighbor 4.4.4.4 remote-as 345
R3(config-router)#neighbor 4.4.4.4 update-source lo0
R3(config-router)#neighbor 4.4.4.4 next-hop-self

Last, but not least we configure R4 as an IBGP peer to R5 and R3. In addition, we will create a loopback and add it into BGP.  We will use the loopack as a target destination from R1 to verify the load balancing in a later step, so watch for that coming up.

R4(config)#int loop 44
R4(config-if)#ip add 44.44.44.44 255.255.255.0
R4(config-if)#router bgp 345
R4(config-router)#neighbor 5.5.5.5 remote-as 345
R4(config-router)#neighbor 3.3.3.3 remote-as 345
R4(config-router)#network 44.44.44.0 mask 255.255.255.0

Now let’s verify. Because we are on R4, let’s verify the BGP neighborships it has.

R4#show ip bgp summary
BGP router identifier 44.44.44.44, local AS number 345
BGP table version is 2, main routing table version 2
1 network entries using 120 bytes of memory
1 path entries using 52 bytes of memory
2/1 BGP path/bestpath attribute entries using 248 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
Bitfield cache entries: current 1 (at peak 1) using 32 bytes of memory
BGP using 452 total bytes of memory
BGP activity 1/0 prefixes, 1/0 paths, scan interval 60 secs

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
3.3.3.3         4   345       4       5        2    0    0 00:00:41        0
5.5.5.5         4   345       4       5        2    0    0 00:00:35        0

! Note:  we can easily verify what routes are being advertised out from R4.

R4#show ip bgp neighbors 5.5.5.5 advertised-routes
BGP table version is 2, local router ID is 44.44.44.44
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 44.44.44.0/24    0.0.0.0                  0         32768 i

Total number of prefixes 1
R4#show ip bgp neighbors 3.3.3.3 advertised-routes
BGP table version is 2, local router ID is 44.44.44.44
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 44.44.44.0/24    0.0.0.0                  0         32768 i

Total number of prefixes 1
R4#

Looks like AS 345 is fine. Let’s jump to R1, in AS 126, and verify from there.

R1#show ip bgp summary
BGP router identifier 1.1.1.1, local AS number 126
BGP table version is 3, main routing table version 3
1 network entries using 120 bytes of memory
2 path entries using 104 bytes of memory
1 multipath network entries and 2 multipath paths
2/1 BGP path/bestpath attribute entries using 248 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 496 total bytes of memory
BGP activity 1/0 prefixes, 2/0 paths, scan interval 60 secs

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
2.2.2.2         4   126      10       9        3    0    0 00:06:39        1
6.6.6.6         4   126      11      10        3    0    0 00:07:14        1
R1#show ip bgp
BGP table version is 3, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
* i44.44.44.0/24    6.6.6.6                  0    100      0 345 i
*>i                 2.2.2.2                  0    100      0 345 i

! Note:  Looks like we have the neighbors, and the 44.44.44.0/24 prefix.
! To see more detail on the 44.44.44.0 network, we can use a couple additional commands.

R1#show ip bgp 44.44.44.0
BGP routing table entry for 44.44.44.0/24, version 3
Paths: (2 available, best #2, table Default-IP-Routing-Table)
Multipath: iBGP
Flag: 0x820
  Not advertised to any peer
  345
    6.6.6.6 (metric 1) from 6.6.6.6 (6.6.6.6)
      Origin IGP, metric 0, localpref 100, valid, internal, multipath
      DMZ-Link Bw 750 kbytes
  345
    2.2.2.2 (metric 1) from 2.2.2.2 (2.2.2.2)
      Origin IGP, metric 0, localpref 100, valid, internal, multipath, best
      DMZ-Link Bw 625 kbytes

! Note: Let's see what the routing table has to say about this network.

R1#show ip route 44.44.44.0
Routing entry for 44.44.44.0/24
  Known via "bgp 126", distance 200, metric 0
  Tag 345, type internal
  Last update from 2.2.2.2 00:02:56 ago
  Routing Descriptor Blocks:
  * 6.6.6.6, from 6.6.6.6, 00:02:56 ago
      Route metric is 0, traffic share count is 6
      AS Hops 1
      Route tag 345
    2.2.2.2, from 2.2.2.2, 00:02:56 ago
      Route metric is 0, traffic share count is 5
      AS Hops 1
      Route tag 345

! Note: We can also get the information from the CEF table.

R1#show ip cef 44.44.44.0
44.44.44.0/24, version 47, epoch 0, per-destination sharing
0 packets, 0 bytes
  via 6.6.6.6, 0 dependencies, recursive
    traffic share 6
    next hop 10.16.0.6, FastEthernet0/1 via 6.6.6.0/24
    valid adjacency
  via 2.2.2.2, 0 dependencies, recursive
    traffic share 5
    next hop 10.12.0.2, FastEthernet0/0 via 2.2.2.0/24
    valid adjacency
  0 packets, 0 bytes switched through the prefix
  tmstats: external 0 packets, 0 bytes
           internal 0 packets, 0 bytes

So now that the route is there, how do we test the load balancing? One option is to do an extended ping, and record the path. We are expecting a 6 to 5 ratio for outbound traffic favoring the R6 path more than the R2 path. Let’s send 30 ping requests, and show the full response for the benefit of verification.

R1#ping
Protocol [ip]:
Target IP address: 44.44.44.44
Repeat count [5]: 30
Datagram size [100]:
Timeout in seconds [2]:
Extended commands [n]: y
Source address or interface: loopback0
Type of service [0]:
Set DF bit in IP header? [no]:
Validate reply data? [no]:
Data pattern [0xABCD]:
Loose, Strict, Record, Timestamp, Verbose[none]: r
Number of hops [ 9 ]: 4
Loose, Strict, Record, Timestamp, Verbose[RV]:
Sweep range of sizes [n]:
Type escape sequence to abort.
Sending 30, 100-byte ICMP Echos to 44.44.44.44, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
Packet has IP options:  Total option bytes= 19, padded length=20
 Record route: <*>
   (0.0.0.0)
   (0.0.0.0)
   (0.0.0.0)
   (0.0.0.0)

Reply to request 0 (204 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

Reply to request 1 (156 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

! Note: the path changes on the next ping request, and begins to use R6 as the next hop.

Reply to request 2 (160 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 3 (128 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 4 (156 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 5 (172 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 6 (108 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 7 (136 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 8 (180 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

Reply to request 9 (152 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

Reply to request 10 (80 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

Reply to request 11 (308 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

Reply to request 12 (204 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

Reply to request 13 (108 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 14 (160 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 15 (140 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 16 (140 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 17 (104 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 18 (84 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 19 (192 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

Reply to request 20 (232 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

Reply to request 21 (220 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

Reply to request 22 (168 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

Reply to request 23 (140 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

Reply to request 24 (88 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 25 (224 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 26 (484 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 27 (128 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 28 (108 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 29 (136 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Success rate is 100 percent (30/30), round-trip min/avg/max = 80/166/484 ms
R1#

The first 2 requests, numbered 0-1, used the path of R2-R3-R4. The next 6 requests, numbered 2-7, used the path of R6-R5-r4. The next 5, numbered 8-12, use the R2-R3-R4 path again, and then the next 6 use the R6-R5-R4 path.

Happy studies-

Keith

Keith

Tagged with:
Jan 30
The first in series of posts discussing various BGP anomalies.
Tagged with:
Jan 17
Inter-AS mVPN implementation outlined using the concepts of MDT SAFI, BGP Connector and RPF Proxy Vector
Tagged with:
preload preload preload