<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>CCIE Training &#187; ospf</title>
	<atom:link href="http://ccie-training.org/category/ospf/feed/" rel="self" type="application/rss+xml" />
	<link>http://ccie-training.org</link>
	<description>Roadmap to the title</description>
	<lastBuildDate>Tue, 07 Sep 2010 20:04:20 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>OSPF on the move?  Include a Forwarding Address</title>
		<link>http://feedproxy.google.com/~r/ine/~3/T3rjkWdkX8g/</link>
		<comments>http://feedproxy.google.com/~r/ine/~3/T3rjkWdkX8g/#comments</comments>
		<pubDate>Fri, 03 Sep 2010 08:01:42 +0000</pubDate>
		<dc:creator>Keith Barker, CCIE #6783</dc:creator>
				<category><![CDATA[CCIE General]]></category>
		<category><![CDATA[CCIE R&S]]></category>
		<category><![CDATA[IGP]]></category>
		<category><![CDATA[ccie]]></category>
		<category><![CDATA[ospf]]></category>
		<category><![CDATA[CCIE 4.0]]></category>
		<category><![CDATA[ospf fa]]></category>
		<category><![CDATA[Troubleshooting]]></category>
		<category><![CDATA[vol2]]></category>

		<guid isPermaLink="false">http://blog.ine.com/?p=4127</guid>
		<description><![CDATA[So why would only 1 of 2 reachable next hops work for OSPF?  Test your knowledge now!]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.ine.com%2F2010%2F09%2F03%2Fospf-on-the-move-include-a-forwarding-address%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.ine.com%2F2010%2F09%2F03%2Fospf-on-the-move-include-a-forwarding-address%2F&amp;source=inetraining&amp;style=normal" height="61" width="50" /><br />
			</a>
		</div>
<p>I enjoyed Petr&#8217;s article regarding explicit next hop.  It reminded me of a scenario where a redistributed route, going into OSPF conditionally worked, depending on which reachable next hop was used.</p>
<p>Here is the topology for the scenario:</p>
<p><img class="alignnone size-full wp-image-4128" title="3 routers ospf fa blogpost" src="http://blog.ine.com/wp-content/uploads/2010/09/3-routers-ospf-fa-blogpost.png" alt="3 routers ospf fa blogpost" width="766" height="120" /></p>
<p>Here is the relevant (and working <img src='http://blog.ine.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> ) information for R1.<span id="more-4127"></span></p>
<p><img class="alignnone size-full wp-image-4129" title="R1 screenshot" src="http://blog.ine.com/wp-content/uploads/2010/09/R1-screenshot.png" alt="R1 screenshot" width="673" height="446" /></p>
<p>When we replace the static route, with a new reachable next hop, we loose the ability to ping 100.100.100.3</p>
<p><img class="alignnone size-full wp-image-4130" title="R1 screenshot 2" src="http://blog.ine.com/wp-content/uploads/2010/09/R1-screenshot-2.png" alt="R1 screenshot 2" width="662" height="386" /></p>
<p>When we change the next hop for the static route, (which is being redistributed into OSPF), the route to 100.100.100.0/24 no longer works, even though we have verified ability to ping the new next hop.</p>
<p><strong>Can you solve this puzzle?  Please post your ideas!<br />
</strong></p>
<p>For more troubleshooting scenarios, please see our <a title="CCIE RS Workbooks" href="http://www.ine.com/self-paced/ccie-routing-switching/workbooks.htm" ><span style="color: #0000ff;"><strong>CCIE Route-Switch workbooks</strong></span></a>, volume 2, for more than 100 challenging troubleshooting scenarios.</p>
<p>We will post the results right here, in a few days, after you have had a chance to post your comments and ideas.</p>
<p>Best wishes,</p>
<p>Keith</p>
<p><img class="alignnone size-full wp-image-4131" title="Keith" src="http://blog.ine.com/wp-content/uploads/2010/09/Keith.jpg" alt="Keith" width="307" height="175" /></p>
<img src="http://feeds.feedburner.com/~r/ine/~4/T3rjkWdkX8g" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://feedproxy.google.com/~r/ine/~3/T3rjkWdkX8g/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Understanding Third-Party Next-Hop</title>
		<link>http://feedproxy.google.com/~r/ine/~3/S23_bzWZRdg/</link>
		<comments>http://feedproxy.google.com/~r/ine/~3/S23_bzWZRdg/#comments</comments>
		<pubDate>Thu, 02 Sep 2010 13:34:24 +0000</pubDate>
		<dc:creator>Petr Lapukhov, CCIE #16379</dc:creator>
				<category><![CDATA[IGP]]></category>
		<category><![CDATA[ccie]]></category>
		<category><![CDATA[ospf]]></category>
		<category><![CDATA[BGP]]></category>
		<category><![CDATA[CCDE]]></category>
		<category><![CDATA[eigrp]]></category>
		<category><![CDATA[rip]]></category>
		<category><![CDATA[third-party next-hop]]></category>

		<guid isPermaLink="false">http://blog.ine.com/?p=4117</guid>
		<description><![CDATA[
			
				
			
		
Abstract
This publication briefly covers the use of 3rd party next-hops in OSPF, RIP, EIGRP and BGP routing protocols. Common concepts are introduced and protocol-specific implementations are discussed. Basic understanding of the routing protocol function is required before reading this blog post.
Overview
Third-party next-hop concept appears only to distance vector protocol, or in the parts of the [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.ine.com%2F2010%2F09%2F02%2Funderstanding-third-party-next-hop%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.ine.com%2F2010%2F09%2F02%2Funderstanding-third-party-next-hop%2F&amp;source=inetraining&amp;style=normal" height="61" width="50" /><br />
			</a>
		</div>
<h4>Abstract</h4>
<p>This publication briefly covers the use of 3rd party next-hops in OSPF, RIP, EIGRP and BGP routing protocols. Common concepts are introduced and protocol-specific implementations are discussed. Basic understanding of the routing protocol function is required before reading this blog post.</p>
<h4>Overview</h4>
<p>Third-party next-hop concept appears only to distance vector protocol, or in the parts of the link-state protocols that exhibit distance-vector behavior. The idea is that a distance-vector update carries explicit next-hop value, which is used by receiving side, as opposed to the &#8220;implicit&#8221; next-hop calculated as the sending router&#8217;s address &#8211; the source address in the IP header carrying the routing update. Such &#8220;explicit&#8221; next-hop is called &#8220;third-party&#8221; next-hop IP address, allowing for pointing to a different next-hop, other than advertising router. Intitively, this is only possible if the advertising and receiving router are on a shared segment, but the &#8220;shared segment&#8221; concept could be generalized and abstracted. Every popular distance-vector protocols support third party next-hop &#8211; RIPv2, EIGRP, OSPF and BGP all carry explicit next-hop value. Look at the figure below &#8211; it illustrates the situation where two different distance-vector protocols are running on the shared segment, but none of them runs on all routers attached to the segment. The protocols &#8220;overlap&#8221; at a &#8220;pivotal&#8221; router and redistribution is used to provide inter-protocol route exchange.</p>
<p> <a href="http://blog.ine.com/wp-content/uploads/2010/09/third-party-nh-generic.png"><img src="http://blog.ine.com/wp-content/uploads/2010/09/third-party-nh-generic.png" alt="third-party-nh-generic" title="third-party-nh-generic" width="218" height="205" class="aligncenter size-full wp-image-4118" /></a><br />
<span id="more-4117"></span><br />
Per the default distance-vector protocol behavior, traffic from one routing domain going into another has cross the &#8220;pivotal&#8221; router, the router where the two domains overlap (R3 in our case) &#8211; as opposed to going directly to the closes next-hop on the shared segment. The reason for this is that there is no direct &#8220;native&#8221; update exchange between the hops running different routing protocols. In situations like this, it is beneficial to rewrite the next-hop IP address to point toward the &#8220;optimum&#8221; exit point, using the &#8220;pivotal&#8221; router&#8217;s knowledge of both routing protocols. </p>
<p>OSPF is somewhat special with respect to the 3rd party next-hop implementation. It supports third-party next-hop in Type-5/7 LSAs (External Routing Information LSA and NSSA External LSA). These LSAs are processed in &#8220;distance-vector manner&#8221; by every receiving router. By default, the LSA is assumed to advertise the external prefix &#8220;connected&#8221; to the advertising router. However, if the FA is non-zero, the address in this field is used to calculate the forwarding information, as opposed to default forwarding toward the advertising router. Forwarding Address is always present in Type-7 LSAs, for the reason illustrated on the figure below:</p>
<p> <a href="http://blog.ine.com/wp-content/uploads/2010/09/third-party-nh-ospf-nssa-fa.png"><img src="http://blog.ine.com/wp-content/uploads/2010/09/third-party-nh-ospf-nssa-fa.png" alt="third-party-nh-ospf-nssa-fa" title="third-party-nh-ospf-nssa-fa" width="210" height="329" class="aligncenter size-full wp-image-4119" /></a></p>
<p>Since there could be multiple ABRs in NSSA area, only one is elected to perform 7-to-5 LSA translation &#8211; otherwise the routing information will loop back in the area, unless manual filtering implemented in the ABRs (which is prone to errors). Translating ABR is elected based on the highest Router-ID, and may not be on the optimum path toward the advertising ASBR. Therefore, the forwarding address should prompt the more optimum path, based on the inter-area routing information.</p>
<h4>EIGRP</h4>
<p>We start with the scenario where we redistribute RIP into EIGRP. </p>
<p> <a href="http://blog.ine.com/wp-content/uploads/2010/09/third-party-nh-rip2eigrp.png"><img src="http://blog.ine.com/wp-content/uploads/2010/09/third-party-nh-rip2eigrp.png" alt="third-party-nh-rip2eigrp" title="third-party-nh-rip2eigrp" width="218" height="205" class="aligncenter size-full wp-image-4120" /></a></p>
<p>Notice that EIGRP will not insert the third-party next-hop until you apply the command <b>no ip next-hop-self eigrp</b> on R3&#8217;s connection to the shared segment. Look at the routing table output prior to applying the <b>no ip next-hop-self eigrp</b> command.</p>
<pre>
<b>R1#show  ip route eigrp </b>
     140.1.0.0/16 is variably subnetted, 2 subnets, 2 masks
D EX    140.1.2.2/32
           [170/2560002816] via <span style="BACKGROUND-COLOR: #ffff00">140.1.123.3</span>, 00:00:27, FastEthernet0/0
</pre>
<p>After the command has been applied to R3’s interface:</p>
<pre>
<b>R1#show  ip route eigrp</b>
     140.1.0.0/16 is variably subnetted, 2 subnets, 2 masks
D EX    140.1.2.2/32
           [170/2560002816] via <span style="BACKGROUND-COLOR: #ffff00">140.1.123.2</span>, 00:00:04, FastEthernet0/0
</pre>
<p>The same behavior is observed when redistributing OSPF into EIGRP, but not when redistributing BGP. For some reason, BGP&#8217;s next-hop is not copied into EIGRP, e.g. in the example below, EIGRP will NOT insert the BGP&#8217;s next-hop into updates. Notice that you may enable or disable the third-party next-hop behavior in EIGRP using the interface-level command <b>ip next-hop-self eigrp</b>.</p>
<h4>RIP</h4>
<p>RIP passes the third-party next-hop from OSPF, BGP or EIGRP. For instance, assume EIGRP redistribution into RIP. You have to turn on the <b>no ip split-horizon</b> on R3&#8217;s Ethernet connection to get this to work:</p>
<p><a href="http://blog.ine.com/wp-content/uploads/2010/09/third-party-nh-eigrp2rip.png"><img src="http://blog.ine.com/wp-content/uploads/2010/09/third-party-nh-eigrp2rip.png" alt="third-party-nh-eigrp2rip" title="third-party-nh-eigrp2rip" width="220" height="206" class="aligncenter size-full wp-image-4121" /></a></p>
<pre>
<b>R2#show ip route rip </b>
     140.1.0.0/16 is variably subnetted, 3 subnets, 2 masks
R       140.1.1.1/32 [120/1] via <span style="BACKGROUND-COLOR: #ffff00">140.1.123.1</span>, 00:00:17, FastEthernet0/0
</pre>
<p>Notice the following RIP debugging output, which lists the third-party next-hop:</p>
<pre>
RIP: received v2 update from 140.1.123.3 on FastEthernet0/0
     140.1.1.1/32 <span style="BACKGROUND-COLOR: #ffff00">via 140.1.123.1</span> in 1 hops
     140.1.123.0/24 via 0.0.0.0 in 1 hops
</pre>
<p>Surprisingly, there is NO need to enable the command <b>no ip split-horizon</b> on the interface when redistributing BGP or OSPF routes into RIP. Seem like only EIGRP to RIP redistribution requires that. Keep in mind, however, that split-horizon is OFF by default on physical frame-relay interfaces. Here is a sample output of redistributing BGP into RIP using the third-party next-hop:</p>
<pre>
<b>R3#show ip route bgp </b>
     140.1.0.0/16 is variably subnetted, 3 subnets, 2 masks
B       140.1.2.2/32 [20/0] via <span style="BACKGROUND-COLOR: #ffff00">140.1.123.2</span>, 00:22:13
R3#

<b>R1#show ip route rip </b>
     140.1.0.0/16 is variably subnetted, 3 subnets, 2 masks
R       140.1.2.2/32 [120/1] via <span style="BACKGROUND-COLOR: #ffff00">140.1.123.2</span>, 00:00:09, FastEthernet0/0
</pre>
<p>RIP’s third-party next-hop behavior is fully automatic. You cannot disable or enable it, like you do in EIGRP.</p>
<h4>OSPF</h4>
<p>Similarly to RIP, OSPF has no problems picking up the third-party next-hop from BGP, EIGRP or RIP. Here is how it would look like (guess which protocol is redistributed into OSPF, based solely on the commands output):</p>
<pre>
<b>R1#sh ip route ospf </b>
     140.1.0.0/16 is variably subnetted, 3 subnets, 2 masks
O E2    140.1.2.2/32 [110/1] via <span style="BACKGROUND-COLOR: #ffff00">140.1.123.2</span>, 00:34:59, FastEthernet0/0

<b>R1#show ip ospf database external </b>

            OSPF Router with ID (140.1.1.1) (Process ID 1)

                Type-5 AS External Link States

  Routing Bit Set on this LSA
  LS age: 131
  Options: (No TOS-capability, DC)
  LS Type: AS External Link
  Link State ID: 140.1.2.2 (External Network Number )
  Advertising Router: <span style="BACKGROUND-COLOR: #ffff00">140.1.123.3</span>
  LS Seq Number: 80000002
  Checksum: 0xF749
  Length: 36
  Network Mask: /32
        Metric Type: 2 (Larger than any link state path)
        TOS: 0
        Metric: 1
        Forward Address: <span style="BACKGROUND-COLOR: #ffff00">140.1.123.2</span>
        External Route Tag: 200
</pre>
<p>If you’re still guessing, the external protocol is BGP, as could have been seen observing the automatic External Route Tag – OSPF set’s it to the last AS# found in the AS_PATH. </p>
<p> <a href="http://blog.ine.com/wp-content/uploads/2010/09/third-party-nh-bgp2ospf.png"><img src="http://blog.ine.com/wp-content/uploads/2010/09/third-party-nh-bgp2ospf.png" alt="third-party-nh-bgp2ospf" title="third-party-nh-bgp2ospf" width="298" height="256" class="aligncenter size-full wp-image-4122" /></a></p>
<p>There are special conditions to be met by OSPF for the FA address to be used. First, the interface where the third party next-hop resides should be advertised into OSPF using the <b>network</b> command. Secondly, this interface should not be passive in OSPF and should not have network type point-to-point or point-to-multipoint. Violating any of these conditions will stop OSPF from using the FA in type-5 LSA created for external routes. Violating any of these conditions prevents third-party next-hop installation in the external LSAs.</p>
<p>OSPF is special in one other respect. Distance vector-protocols such as RIP or EIGRP modify the next-hop as soon as they pass the routing information to other devices. That is, the third party next-hop is not maintained through the RIP or EIGRP domain. Contrary to these, OSPF LSAs are flooded within their scope with the FA unmodified. This creates interesting problem: if the FA address is not reachable in the receiving router’s routing table, the external information found in type 7/5 LSA is not used. This situation is discussed in the blog post “OSPF Filtering using FA Address”.</p>
<h4>BGP</h4>
<p>When you redistribute any protocol into BGP, the system correctly sets the third-party next-hop in the <b>local</b> BGP table. Look at the diagram below, where EIGRP prefixes are being redistributed into BGP AS 300:</p>
<p> <a href="http://blog.ine.com/wp-content/uploads/2010/09/third-party-nh-eigrp2bgp1.png"><img src="http://blog.ine.com/wp-content/uploads/2010/09/third-party-nh-eigrp2bgp1.png" alt="third-party-nh-eigrp2bgp" title="third-party-nh-eigrp2bgp" width="299" height="256" class="aligncenter size-full wp-image-4124" /></a></p>
<p>R3’s BGP process installs R1 Loopback0 prefix into the BGP table with the next-hop value of R1’s address, not “0.0.0.0” like it would be for locally advertised routes. You will observe the same behavior if you inject EIGRP prefixes into BGP using the <b>network</b> command.</p>
<pre>
<b>R3#sh ip bgp</b>
BGP table version is 9, local router ID is 140.1.123.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 140.1.1.1/32     <span style="BACKGROUND-COLOR: #ffff00">140.1.123.1</span>         156160         32768 ?
</pre>
<p>Furthermore, BGP is supposed to change the next-hop to self when advertising prefixes over eBGP peering sessions. However, when all peers share the same segment, the prefixes re-advertised over the shared segment do not have their next-hop changed. See the diagram below:</p>
<p> <a href="http://blog.ine.com/wp-content/uploads/2010/09/third-pary-nh-bgp2bgp.png"><img src="http://blog.ine.com/wp-content/uploads/2010/09/third-pary-nh-bgp2bgp.png" alt="third-pary-nh-bgp2bgp" title="third-pary-nh-bgp2bgp" width="371" height="256" class="aligncenter size-full wp-image-4125" /></a></p>
<p>Here R1 advertises prefix 140.1.1.1/24 to R3 and R3 re-advertises it back to R2 over the same segment. Unless non-physical interfaces are used to form the BGP sessions (e.g. Loopbacks), the next-hop received from R1 is not changed when passing it down to R2. This implements the default third-party next-hop preservation over eBGP sessions. Look at the sample output for the configuration illustrated above: R1 receives R2’s prefix with unmodified next-hop.</p>
<pre>
<b>R1#show ip bgp </b>
BGP table version is 3, local router ID is 140.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 140.1.1.1/32     0.0.0.0                  0         32768 i
*> 140.1.2.2/32     <span style="BACKGROUND-COLOR: #ffff00">140.1.123.2</span>                            0 300 200 i
</pre>
<p>There is a way to disable this default behavior in BGP. A logical assumption would be that using the command <b>neighbor X.X.X.X next-hop-self</b> would work, and it does indeed, in the recent IOS versions. The older IOS, such as 12.2T did not have this command working for eBGP sessions, and your option would have been using a route-map with <b>set ip next-hop</b> command. The route-map method may still be handy, if you want insert totally “bogus” IP next-hop from the shared segment – receiving BGP speaker will accept any IP address that is on the same segment. That is not something you would do in the production environment too often, but definitely an interesting idea for lab practicing. One good use in production is changing the BGP next-hop to an HSRP virtual IP address, to provide physical BGP speaker redundancy. Here is a sample code for setting an explicit next-hop in BGP update:</p>
<pre>
router bgp 300
 neighbor 140.1.123.1 remote-as 100
 neighbor 140.1.123.1 route-map BGP_NEXT_HOP out
!
route-map BGP_NEXT_HOP permit 10
 set ip next-hop 140.1.123.100
</pre>
<h4>Summary</h4>
<p>All popular distance-vector protocols support third-party next-hop insertion. This mechanism is useful on multi-access segments, in situations when you want pass optimum path information between routers belonging to different routing protocols. We illustrated that RIP implements this function automatically, and does not allow any tuning. On the other hand, EIGRP supports third-party next-hop passing from any protocol, other than BGP, and you may turn this function on/off on per-interface basis. Furthermore, OSPF’s special feature is propagation of the third-party next-hop within an area/autonomous system, unlike the distance-vector protocols that reset the next-hop at every hop (considering AS a being a “single-hop” for BGP). Thanks to that feature, OSPF offers interesting possibility to filter external routing information by blocking FA prefix from the routing tables. Finally, BGP gives most flexibility when it comes to the IP next-hop manipulation, allowing for changing it to any value. </p>
<h4>Further Reading</h4>
<p><a href=http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a008009405a.shtm>Common Routing Problem with OSPF Forwarding Address</a><br />
<a href=http://blog.ine.com/2009/11/13/ospf-prefix-filtering-using-forwarding-address/>OSPF Prefix Filtering Using Forwarding Address</a><br />
<a href=http://www.cisco.com/en/US/tech/tk365/technologies_configuration_example09186a0080093f2c.shtml>BGP Redundancy using HSRP</a></p>
<img src="http://feeds.feedburner.com/~r/ine/~4/S23_bzWZRdg" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://feedproxy.google.com/~r/ine/~3/S23_bzWZRdg/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>OSPF Fast Convergence</title>
		<link>http://feedproxy.google.com/~r/ine/~3/Z4FMsChEGlA/</link>
		<comments>http://feedproxy.google.com/~r/ine/~3/Z4FMsChEGlA/#comments</comments>
		<pubDate>Wed, 02 Jun 2010 22:10:53 +0000</pubDate>
		<dc:creator>Petr Lapukhov, CCIE #16379</dc:creator>
				<category><![CDATA[IGP]]></category>
		<category><![CDATA[IP Routing]]></category>
		<category><![CDATA[ccie]]></category>
		<category><![CDATA[ospf]]></category>
		<category><![CDATA[CCDE]]></category>
		<category><![CDATA[fast convergence]]></category>
		<category><![CDATA[sub-second convergence]]></category>
		<category><![CDATA[timer tuning]]></category>

		<guid isPermaLink="false">http://blog.ine.com/?p=3947</guid>
		<description><![CDATA[
			
				
			
		
This goal of this post is breif discussion of main factors controlling fast convergence in OSPF-based networks. Network convergence is a term that is sometimes used under various interpretations. Before we discuss the optimization procedures for OSPF, we define network convergence as the process of synchronizing network forwarding tables after a topology change. Network is [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.ine.com%2F2010%2F06%2F02%2Fospf-fast-convergenc%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.ine.com%2F2010%2F06%2F02%2Fospf-fast-convergenc%2F&amp;source=inetraining&amp;style=normal" height="61" width="50" /><br />
			</a>
		</div>
<p>This goal of this post is breif discussion of main factors controlling fast convergence in OSPF-based networks. Network convergence is a term that is sometimes used under various interpretations. Before we discuss the optimization procedures for OSPF, we define network convergence as the process of synchronizing network forwarding tables after a topology change. Network is said to be converged when none of forwarding tables are changing for &#8220;some reasonable&#8221; amount of time. This &#8220;some&#8221; amount of time could be defined as some interval, based on the expected maximum time to stabilize after a single topology change. Network convergence based on native IGP mechanisms is also known as network restoration, since it heals the lost connections. Network mechanisms for traffic protection such as ECMP, MPLS FRR or IP FRR offering different approach to failure handling are outside the scope of this article. We are further taking multicast routing fast recovery out of the scope as well, even though this process is tied to IGP re-convergence.</p>
<p>It is interesting to notice that IGP-based &#8220;restoration&#8221; techniques have one (more or less) important problem. During the time of re-convergence, temporary micro-loops may exist in the topology due to inconsistency of FIB (forwarding) tables of different routers. This behavior is fundamental to link-state algorithms, as routers closer to failure tend to update their forwarding database before the other routers. The only popular routing protocol that lacks this property is EIGRP, which is loop-free at any moment during re-convergence, thanks to the explicit termination of the diffusing computations. For the link state-protocols, there are some enhancements to the FIB update procedures that allow avoiding such micro-loops with link-state routing, described in the document <a href=http://tools.ietf.org/html/draft-ietf-rtgwg-ordered-fib-01> [ORDERED-FIB]</a>.</p>
<p>Even though we are mainly concerned with OSPF, ISIS will be mentioned in the discussion as well. It should be noted that compared to IS-IS, OSPF provides less &#8220;knobs&#8221; for convergence optimization.  The main reason is probably the fact that ISIS is being developed and supported by a separate team of developers, more geared towards the ISPs where fast convergence is a critical competitive factor. The common optimization principles, however, are the same for both protocols, and during the conversation will point out at the features that OSPF lacks while IS-IS has for tuning. Finally, we start our discussion with a formula, which is further explained in the text:</p>
<p><b><i>Convergence = Failure_Detection_Time + Event_Propagation_Time + SPF_Run_Time + RIB_FIB_Update_Time</i></b></p>
<p>The formula reflects the fact that convergence time for a link-state protocol is sum of the following components:</p>
<ul>
<li>Time to detect the network failure, e.g. interface down condition.</li>
<li>Time to propagate the event, i.e. flood the LSA across the topology.</li>
<li>Time to perform SPF calculations on all routers upon reception of the new information.</li>
<li>Time to update the forwarding tables for all routers in the area.</li>
</ul>
<p><span id="more-3947"></span></p>
<h4>Part I: Fast Failure Detection</h4>
<p>Detecting link and node failures quickly is number one priority for fast convergence. For maximum speed, relying on IGP keepalive times should be avoided whether possible and physical failure detection mechanisms should be used. This implies the use of <i>physical</i> point-to-point links whether possible. As for link technology, it should be able to detect loss of link within shortest interval possible. For example, a point-to-point gigabit Ethernet link may report failure almost instantly (by detecting of network pulses) if there is no Ethernet switch connecting the two nodes. However, there could be some hardware-dependent timers that may delay reporting the physical-layer event, such as debounce timers. With the GigE example, there is carrier-delay timer, which is set per interface using the command <b>carrier-delay (ms)</b>.  Aiming at fast convergence, you would like to set this time to zero, <b>unless</b> you have special subnetwork technology, such as SONET, which is able to provide protection within a short interval e.g. under 50ms. In that case, you may want to consider setting the technology-specific delay timer to a value higher than the SONET recovery time, so that a non-critical physical failure is never noticed and healed under the network layer. In most cases, it makes sense to rely on subnetwork recovery mechanics if it is available and provides <i>timely repair</i> within your target convergence time. However, more often you have to deal with &#8220;cheaper&#8221; technology, such as GigE running over DWDM lambdas, and if that&#8217;s the case, minimizing the detection/indication timers is your primary goal. Notice that another positive result of using point-to-point link is the fact that OSPF becomes adjacent faster, thanks to the fact that DR elections are no longer needed. Additionally, type 2 LSAs are not generated for point-to-point link, which slightly reduces OSPF LSDB size and topology complexity.</p>
<p>What would you do if your connection is not physical point-to-point or does not allow translating loss of signal information in timely fashion? Good example could be switched Ethernet or Frame-Relay PVC link. Sometimes there are solutions such as Ethernet port failure translation that may detect an upstream switch port failure and reflect it to the downstream ports, which could be reasonably fast. For another example, Frame-Relay may signal PVC loss via asynchronous LMI updates or A-bit (active bit) in LMI status reports. However, such mechanisms, especially the ones relying on Layer 2 feature may not be timely enough to report failure fast. In such cases, it could be a good idea to rely on fast IGP keepalive timers. Both OSPF and ISIS support fast hellos with the dead/hold interval of one second and sub-second hello intervals (<a href=http://www.cisco.com/en/US/docs/ios/12_0s/feature/guide/fasthelo.html >[OSPF-FASTHELLO]</a>). Using this <i>medium-agnostic</i> mechanism could reduce fault detection on non point-to-point links to one second, which could be better than relying on Layer-2 specific signaling. However, fast hello timers have one significant drawback: since all hello packets are processes by the router&#8217;s main CPU, having hundreds or more of OSPF/IS-IS neighbors may have significant impact on router&#8217;s control plane performance. An alternative could be using BFD (bi-directional forwarding detection, see <a href=http://tools.ietf.org/html/rfc5880 >[BFD]</a>), which provides protocol-agnostic failure detection mechanism that could be reused by multiple routing protocols (e.g. OSPF/ISIS/BGP and so on). BFD is based on the same idea of sub-second keepalive timers, that could be implemented in distributed router interface line-cards, therefore saving the control-plane and central CPU from over-utilization. </p>
<h4>Part II: Event Propagation</h4>
<p>In OSPF and IS-IS topology changes (event) are advertised by means of LSA/LSP flooding. For network to completely converge, an LSA/LSP needs to reach every router within its flooding scope. Normally, in properly designed network, the flooding scope is one area (flooding domain), unless the information is flooded as external, i.e. by means of Type-5 LSA in OSPF. In general, LSA/LSP propagation time is determined by the following factors:</p>
<ol>
<li><b>LSA generation delay</b>. IGP implementations normally throttle LSA generation to prevent excessive flooding in case of oscillating (constantly flapping) links. Original OSPF specification required every LSA generation to be delayed for a fixed interval that defaulted to one second. To optimize this behavior, Cisco&#8217;s OSPF and ISIS implementations use exponential backoff algorithm to dynamically calculate the delay for generating the SAME LSA (same LSA ID, LSA type and originating Router ID) by the router. You may find more information about truncated exponential backoff in <a href=http://blog.ine.com/2009/12/31/tuning-ospf-performance/>[TUNING-OSPF]</a>, but in short the process works as following. <br/><br/> Three parameters control the throttling process: <strong>initial</strong> interval, <strong>hold</strong>, and <strong>max_wait</strong> time specified using the command <strong>timers throttle lsa <em>initial hold max_wait</em></strong>. Suppose the network was stable for a relatively long time, and then a router link goes down. As a result, the router needs to generate new router LSA, listing the new connection status. The router delays LSA generation the <strong>initial</strong> amount if milliseconds and sets the <em>next</em> interval to <strong>hold</strong> milliseconds. This ensures that two consecutive events (e.g. link going down and then back up) will be separated by at least the <b>hold</b> interval.  After this, if an additional event occurs <strong>after</strong> the initial wait window expired, the event would be held for processing until the <strong>hold</strong> milliseconds window expire. Thus, all events occurring after the initial delay will be accumulated and processed after the hold time expires. This means the next router LSA will be generated no earlier than <b>hold</b> milliseconds. At the same time, the next hold-time would be doubled, i.e. set to <strong>2*hold</strong>. Effectively, every time an event occurs during the current wait window, the processing is delayed until the current hold-time expires and the <strong>next</strong> hold-time interval is doubled. The hold-time grows exponentially as <b>2^t*hold</b> until it reaches the <strong>max_wait</strong> value. After this, every event received during current hold-time window would result in the next interval being equal to the constant <strong>max_wait</strong>. This ensures that exponential growth is limited or in other words the process is truncated. If there are no events for the duration of <strong>2*max_wait</strong> milliseconds, the hold-time window is reset back to the <strong>initial</strong> value, assuming that the flapping link has returned back to the normal condition.<br/><br />
Initial LSA generation delay has significant impact on network convergence time, so it is important to tune it appropriately. The initial delay should be kept to a minimum, such as 5-10 milliseconds &#8211; setting it to zero is still not recommended, as multiple link failure may occur synchronously (e.g. SRLG failure) and it could be beneficial to reflect them all in a single LSA/LSP. The hold interval should be tuned so that the next LSA is only sent after the network has converged in response to the first event. This means the LSA hold time should be based on the convergence time per the formula above, or more accurately it should be at least above <i>LSA_Initial_Delay + LSA_Propagation_Delay + SPF_Initial_Delay</i>. You may then set the maximum hold time to at least twice the hold interval to enhance flooding protection against at least two concurrent oscillating processes (having more parallel oscillations in not very probable). Notice that a single link failure normally results in at least two LSAs being generated, by every attached router.
</li>
<li><b>LSA reception delay</b>. This delay is a sum of the ingress queueing delay and LSA arrival delay. When a router receives LSA, it may be subject to ingress queueing, though this effect is not significant unless massive BGP re-convergence is occurring at the same time. Even under heavy BGP TCP ACK storm, Cisco IOS input queue discipline known as Selective Packet Discard (see <a href=http://www.cisco.com/en/US/products/hw/routers/ps167/products_tech_note09186a008012fb87.shtml>[SPD]</a>) provides enough room for IGP traffic and handles it as highest priority. The received packets are then rate-limited based on the LSA arrival interval.  OSPF rate-limits only reception of the SAME LSAs (see the definition above): there is a fixed delay between reception of the same LSA originated by a peer. This delay should not exceed the hold-time used for LSA generation &#8211; otherwise the receiving router may drop the second LSA generated by peer, say upon link recovery. Notice that every router on the LSA flooding path adds cumulative delay to this component, but the good news is that the initial LSA/LSP will not be rate-limited &#8211; the arrival delay applies only to the consecutive copy of the same LSA. As such, you may mainly ignore this component for the purpose of the fast reaction to a change, thanks to fast ingress queueing and expedited reception. Keep in mind that if you are tuning the arrival delay you need to adjust the OSPF retransmission timer to be slightly above the first timer. Otherwise, the side that just sent an LSA and has not received an acknowledgemnt may end up re-sending it again just to be dropped by the receiving side. The command to control retransmission interval for the same LSA is <b>timers pacing retransmission</b> </li>
<li><b>Processing Delay</b> is the amount of time it takes the router to put the LSA on the outgoing flood lists.  This delay could be signification if SPF process starts before flooding the LSA. SPF runtime is not the only contributor to the processing delay, but it&#8217;s the one you have control over. If you configured SPF throttling to be fast enough (see next session) &#8211; the exact time varies but mainly the initial delays below than 40ms &#8211; it may happen so that SPF run occurs <b>before</b> the triggering LSA is flooded to neighbors. This will result in slower flooding process. For faster convergence, it is required that LSAs are always flooded <b>prior</b> to SPF run. ISIS process in Cisco IOS supports the command <b>fast-flood</b>, which ensures the LSPs are flooded ahead of running SPF, irrespective of the initial SPF delay. On contrary, OSPF does not support this feature and your only option (at the moment) is properly tuning SPF runtime delays (see below). <br/><br/> The other component that may affect processing delay is the interface LSA/LSP flooding pacing and egress queueing.  Interface flooding pacing is the OSPF feature that mandates a minimum interval between flooding consecutive LSAs out of an interface. This timer runs per interface and only triggers when there is an LSA needed to be sent out right after the previous LSA. The process-level command to control this interval is <b>timers pacing flood (ms)</b> with the default value of 55ms. Note that if there is just one LSA being flooded through the network, this timer will have no effect on its propagation delay, and only the next consecutive LSA could be rate-limited. Therefore, just like with the arrival timer tuning, we can mainly ignore the impact of this delay on the fast convergence process. Still, it is worth tuning the interface flood pacing timers to a small value possible (e.g. 5ms-10ms) to account for the event when multiple LSAs have to be flooded through the topology, since a link failure normally generates at least two LSA/LSPs from both attached routers (we discussed that earlier already). Interesting to note, that a reception of single LSA signaling loss of link from one router is enough to properly rebuild the topology, since SPF algorithm automatically verifies that the link is bidirectional before accounting it for shortest-path computations. Additionally, reducing interface flooding pace timer helps newly attached router to load OSPF database significantly faster, at the expense of some excessive CPU usage. This applies mainly to large OSPF databases and/or flapping link conditions. To protect against frequent massive database reloads on point-to-point links you may additionally use IP Event Dampening feature for suppression of interface status or properly design network for redundancy to avoid full database reloads upon single link restoration. See <a href=http://blog.ine.com/2010/05/03/optimizing-ip-event-dampening/>[OPT-DAMPENING]</a> for information on tuning the IP Event Dampening parameters.<br/><br/> Lastly, egress queueing may result in significant delay on over-utilized links. In short, router&#8217;s egress queue depth could be approximated as <b>Q_Depth=Utilization/(1-Utilization)</b>, meaning that links with 50% or above constant utilization always result in some queueing delay (in average). Proper QoS configuration, such as reserving enough bandwidth to the control plane packets should neutralize the effect of this component, coupled with the fact that routing update packets normally have higher priority for handling by router processes.</li>
<li><b>Packet Propagation Delay</b>. This variable depends is a sum of two major contributors: serialization delay at every hop and cumulative signal propagation delay across the topology. The serialization delay is almost negligible on the modern &#8220;fast&#8221; links (e.g. 12usec for 1500 bytes packet over a 1Gbps link), though it could be more significant on slow WAN links such as series of T1s. Therefore, signal propagation delay is the main contributor due to physical limitations. This value mainly depends on the distance the signal has to travel to cross the whole OSPF/ISIS area. The propagation delay could be roughly approximated as 0.82 ms per 100 miles and have significant impact only for inter-continental deployments or satellite links. For example, it would take at least 41ms to travel a 5000 miles wide topology. However, since most OSPF/ISIS area sizes do not exceed a single continent, this value could not seriously impact total convergence time. </li>
</ol>
<h4>Part III: SPF Calculations</h4>
<p>The SPF algorithm complexity could be bounded as O(L+N*log(N)) where N is number of the nodes and L is the number of the links in a topology under consideration. This estimation hold true provided that implementation is optimal (see <a href=http://www.itl.nist.gov/div897/sqg/dads/HTML/dijkstraalgo.html>[DIJKSTRA-SPF]</a>). Worst case complexity for dense topologies could be as high as O(N^2), but this is rarely seen in real-world topologies. SPF runtime used to be a major limiting factor in the routers of 80s (link-state routing was invented in ARPANET) and 90s (initial OSPF/ISIS deployments) that used slow CPUs where SPF computations may have taken seconds to complete. However, progress in modern hardware (the Moore&#8217;s Law) allowed significantly reducing the impact of this factor on the network convergence, though it is still one of the major contributors to the convergence time. The use of Incremental SPF (iSPF) allows to further minimize the amount of calculations needed when partial changes occur in the network (see <a href=http://blog.ine.com/2009/12/31/tuning-ospf-performance/ >[TUNING-OSPF]</a>). For example, OSPF Type-1 LSA flooding for a leaf connection does not cause complete SPF re-calculation anymore like it would have been when using classic SPF. An important benefit is that the farther away the router is from the failed link, the less time it needs to recompute the SPF. This compensates for the longer propagation delay to deliver the LSA from a distant corner of the network. Notice that OSPF also supports PRC (partial route computation), which takes only a few milliseconds upon reception of Type 3,4,5 LSAs that are treated as distance-vector updates. The PRC process is not delayed and you cannot tune exponential backoff time for PRC, like you can do for IS-IS.</p>
<p>You may find out typical SPF runtimes for your network (to estimate the total convergence time) by using the command <b>show ip ospf statistics</b></p>
<pre>
<b>show ip ospf statistics</b>

            OSPF Router with ID (10.4.1.1) (Process ID 1)

  Area 10: SPF algorithm executed 18 times

  Summary OSPF SPF statistic

  SPF calculation time
Delta T	Intra	D-Intra	Summ	D-Summ	Ext	D-Ext	Total	Reason
1w3d	8	0	0	0	0	0	8	R, X
1w3d	12	0	0	0	4	0	16	R, X
1w3d	16	0	0	0	4	0	20	R, X
1w3d	8	0	0	0	0	0	8	R,
1w3d	20	0	0	0	0	0	20	R, X
1w2d	24	0	0	0	8	0	32	R, X
1w2d	8	4	0	0	0	0	12	R,
6d16h	4	0	0	0	0	4	8	R, X
6d16h	4	0	0	0	0	0	4	R,
6d16h	12	0	0	0	8	0	20	R, X

  RIB manipulation time during SPF (in msec):
Delta T	RIB Update	RIB Delete
1w3d	4	0
1w3d	8	0
1w3d	10	0
1w3d	5	0
1w3d	8	0
1w2d	10	0
1w2d	3	0
6d16h	2	0
6d16h	1	0
6d16h	9	0
</pre>
<p>The above output is divided in two sections: SPF calculation times and RIB manipulation time. For now, we are interested in the values under the &#8220;Total&#8221; column, which represent the total time it took OSPF process to run SPF. You may see how these values vary, depending on the &#8220;Reason&#8221; field. You may want to find the maximum value and use it as an upper limit for SPF computation in your network. In our case, it&#8217;s 32ms. The other section of the output will be discussed later.</p>
<p>The next &#8220;problem&#8221; is known as SPF throttling. Recent Cisco IOS OSPF implementation is designed to use exponential backoff algorithm when scheduling SPF runs. The goal, as usual, is to avoid excessive calculations in the times of high network instability but keep SPF reaction fast for stable networks. Exponential process is identical to the one used for LSA throttling, with the same timer semantics.</p>
<p>So how would one pick up optimal SPF throttling values? As mentioned before, the initial delay should be kept as short as possible to allow for instant reaction to a change but long enough not to trigger the SPF before the LSA is flooded out. It&#8217;s hard to determine the delay to flood the LSA, but at least the initial timer should stay above the per-interface LSA flood pacing timer, so that it does not delay two consecutive LSAs flooded through the topology (as you remember, a typical transit link failure results in generation of at least two LSAs). Setting the interface flooding pacing timer to 5ms and initial SPF delay to 10ms should be a good starting point. After the initial run, SPF algorithm should be further held down for at least the amount of time it takes the network to converge after the initial event. This means that the SPF hold-time should be strictly higher than the value &#8220;SPF_Initial_Delay + SPF_Runtime + RIB_FIB_Update_Time&#8221;. There exists alternate, more pragmatic approach to this timer tuning as well. Let&#8217;s say we want to make sure SPF computations do not take more than 50% of the router&#8217;s CPU time. For this to happen, the hold time should be at least the same as a typical SPF run time. This value could be found based on the router statistics and tuned individually on every router. Based on our example, we may set the hold interval to 32ms+20% (error margin, set higher to add more safety), which is about 38ms, and the maximum interval could be set to twice the hold time, which translates into 33% CPU usage under the worst condition of non-stop LSA storms. Notice that SPF hold and maximum timers could be tuned per-router, to account for the different CPU powers, if this applies to your scenario. Total network convergence time should be estimated based on the &#8220;slowest&#8221; router in the area. </p>
<h4>Part IV: RIB/FIB Update</h4>
<p>After completing SPF computation, OSPF performs sequential RIB update to reflect the changed topology. The RIB updates are further propagated to the FIB table &#8211; based on the platform architecture this could be either centralized or distributed process. The RIB/FIB update process may contribute the <b>most</b> to the convergence time in the topologies with large amount of prefixes, e.g. thousands or tens of thousands.  In such networks, updating RIB and distributed FIB databases on line-cards may take considerable amount of time, such as at the order of 10&#8217;s if not 100&#8217;s of milliseconds (varies depending on the platform).  There are two major ways to minimize the update delay: advertise less prefixes and sequence FIB updates so that important paths are updated before any other. </p>
<p>If you think of all prefixes that need to be in a typical network core, you would realize that you don&#8217;t need any core &#8220;transit&#8221; link prefixes in there. In fact, all you need are normally the stub links at the edge of your network, e.g. PE router loopbacks or the summary prefixes injected from the lower layers of your network hierarchy.  Therefore, it makes sense to suppress the network prefix information advertised for the transit links. One option would be configuring all transit links as IP unnumbered using the IP addresses of the routers&#8217; Loopback interfaces. However, both IS-IS and OSPF has a special protocol capability to implement suppression automatically. In OSPF it is known as &#8220;prefix-suppression&#8221; and prevents OSPF from including the link type 3 (stub network address) in the router LSA (see <a href=http://cisco.com/en/US/docs/ios/12_4t/12_4t15/ht_osmch.html >[OSPF-PREFIX-SUPPRESS]</a>). As you remember, OSPF represents a point-to-point connection between two routers via two link types in a router LSA: type 1, declaring the connection to another router based on its Router-ID and type 3, describing the stub connection/prefix of a point-to-point link (snmp ifIndex is used if the link is unnunmered). The prefix-suppression feature drops the second link type and leaves only the topological information in the router LSA.  As a result, you will not be able to reach the transit link subnet address but still have perfect connectivity within the topology. The command to enable global prefix suppression is entered under OSPF routing process as <b>prefix-suppression</b> to enable suppression globally or per-interface using the syntax <b>ip ospf prefix-suppression [disable]</b>. Notice that by default OSPF does not suppress stub-link advertisement for the router loopback interfaces, unless you have explicitly configured these for suppression.</p>
<p>As soon as you&#8217;re done suppressing all transit link subnets, you are normally left with the router loopback interfaces (typically /32 prefixes) and routing information external to the area, such as summary-addresses or external prefixes. Depending on your network configuration the amount of summary addresses could be significant. The best solution to this problem is optimal summarization and filtering unnecessary prefixes, e.g. by means of of summary-address filters and stub area features. Obviously, this requires a hierarchical address plan, which is not always readily available. If re-designing you network&#8217;s IP addressing is not an option, you may still rely on Cisco IOS priority prefix sequencing, which is supported in ISIS. Unfortunately, there is no support for this feature in OSPF for IOS yet, though there is support in IOS-XR. You may read more about ISIS support for priority-driven RIB Prefix Installation here (<a href=http://www.cisco.com/en/US/docs/ios/12_0s/feature/guide/fslocrib.html >[ISIS-PRIODRIVEN]</a>). The general idea is to expedite some prefix insertion into the forwarding table, starting with the most important ones, such as PE /32 prefixes. It is worth noting that priority sequencing may extend duration of the routing micro-loops during the re-convergence process. In general, the procedure described in (<a href=http://tools.ietf.org/html/draft-ietf-rtgwg-ordered-fib-01>[ORDERED-FIB]</a> works against fast convergence, trading it for loop-free process. </p>
<p>Is there a way to estimate the RIB/FIB manipulation times? As we have seen before, the <b>show ip ospf statistics</b> command provides information on RIB update time, though this output is not provided on every platform, nor there is clear interpretation of the values in Cisco&#8217;s documentation, e.g. it&#8217;s unclear whether there is a checkpoint mechanism to inform OSPF of the FIB entry updates. Special measurements should be taken to estimate these values, as done in <a href=http://feedproxy.google.com/~r/ine/~3/Z4FMsChEGlA/www.cs.princeton.edu/~jrex/teaching/spring2005/reading/shaikh01.pdf >[BLACKBOX-OSPF]</a>, and more importantly these values will heavily depend on the platform used. Still the OSPF RIB manipulation statistics could be useful to estimate the lower bound of network convergence time (though we are mostly interested in the accurate upper boundary).</p>
<h4>Sample Fast Convergence Profile</h4>
<p>Putting the above information together, let&#8217;s try to find an optimum convergence profile based on the fact that we have &#8220;show ip ospf statistics&#8221; output from the &#8220;weakest&#8221; router in the area.</p>
<pre>
<b>show ip ospf statistics</b>

            OSPF Router with ID (10.4.1.1) (Process ID 1)

  Area 10: SPF algorithm executed 18 times

  Summary OSPF SPF statistic

  SPF calculation time
Delta T	Intra	D-Intra	Summ	D-Summ	Ext	D-Ext	Total	Reason
1w3d	8	0	0	0	0	0	8	R, X
1w3d	12	0	0	0	4	0	16	R, X
1w3d	16	0	0	0	4	0	20	R, X
1w3d	8	0	0	0	0	0	8	R,
1w3d	20	0	0	0	0	0	20	R, X
1w2d	24	0	0	0	8	0	32	R, X
1w2d	8	4	0	0	0	0	12	R,
6d16h	4	0	0	0	0	4	8	R, X
6d16h	4	0	0	0	0	0	4	R,
6d16h	12	0	0	0	8	0	20	R, X

  RIB manipulation time during SPF (in msec):
Delta T	RIB Update	RIB Delete
1w3d	4	0
1w3d	8	0
1w3d	10	0
1w3d	5	0
1w3d	8	0
1w2d	10	0
1w2d	3	0
6d16h	2	0
6d16h	1	0
6d16h	9	0
</pre>
<p>Failure Detection Delay: about 5-10ms worst case to detect/report loss of network pulses.<br />
Maximum SPF runtime: 32ms, doubling for safety makes it 64ms<br />
Maximum RIB update: 10ms, doubling for safety makes it 20ms<br />
OSPF interface flood pacing timer: 5ms (does not apply to the initial LSA flooded)</p>
<p>LSA Generation Initial Delay: 10ms (enough to detect multiple link failures resulting from SRLG failure)<br />
SPF Initial Delay: 10ms (enough to hold SPF to allow two consecutive LSAs to be flooded)<br />
Network geographical size: 100 miles (signal propagation is negligible)<br />
Network physical media: 1 Gbps links (serialization delay is negligible)</p>
<p>Estimated network convergence time in response to initial event: 32*2 + 10*2 + 10 + 10 = 40+64 = 100ms. This estimation does not precisely account for FIB update time, but we assume it would be approximately the same as RIB update. We need to make sure out maximum backoff timers exceed this convergence timer to ensure processing is delay above the convergence interval in the worst case scenario.</p>
<p>LSA Generation Hold Time: 100ms (approximately the convergence time)<br />
LSA Generation Maximum Time: 1s (way above the 100ms)<br />
OSPF Arrival Time: 50ms (way below the LSA Generation hold time)<br />
SPF Hold Time: 100ms<br />
SPF Maximum Hold Time: 1s ( Maximum SPF runtime is 32ms, meaning we skip 30 SPF runtimes in the worst condition. This results in SPF consuming no more than 3% of CPU time under worst-case scenario).</p>
<p>Now estimate the worst-case convergence time: LSA_Maximum_Delay (1s) + SPF_Maximum_Delay (1s) + RIB_Update (<1s) < 3 seconds. Even under heavily congested network, CPU usage for SPF calculations will not exceed 3% and network will converge to changes under 3 seconds. Here is a sample OSPF configuration template:</p>
<pre>
router ospf 10
!
! Suppress transit link prefixes
!
 prefix-suppression
!
! Wait at least 50ms between accepting the same LSA
!
 timers lsa arrival 50
!
! Throttle LSA generation
!
 timers throttle lsa all 10 100 1000
!
! Throttle SPF runs
!
 timers throttle spf 10 100 1000
!
! Pace interface-level flooding
!
 timers pacing flood 5
!
! Make retransmission timer > than arrival
!
 timers pacing retransmission 60
!
! Enable incremental SPF
!
 ispf
</pre>
<h4>Conclusions</h4>
<p>It has been well known that link-state IGPs could be tuned for sub-second convergence under almost any practical scenario, yet maintain network stability by the virtue of adaptive backoff timers. In this post we tried to provide a practical approach to calculating the optimum throttling timer values based on your recorded network performance. It is worth noting that three most important timers to tune network for sub-second convergence are the failure detection delay, initial LSA generation delay and initial SPF delay. All other timers, such as hold and maximum time serve the purpose of stabilizing network, and affect convergence in "worst-case" unstable network scenarios. Cisco's recommended values for the initial/hold/maximum timers are 10/100/5000 ms (see <a href=http://www.cisco.com/en/US/docs/solutions/Enterprise/Campus/routed-ex.html>[ROUTED-CAMPUS]</a>, but those may look a bit conservative as they result in the worst-case convergence time above 10 seconds. Additionally, it is important to notice that in large topologies, significant amount of time is spent updating the RIB/FIB updates after reconvergence. Therefore, in addition to tuning the throttling timers you may want to implement other measures such as prefix-suppression, better summarization (e.g. totally stub areas) and minimization of external routing information. If your platform supports the feature, you may also implement priority-driven RIB prefix installation process.</p>
<p>We omitted other fast-convergence elements such as resilient network design, e.g. redundancy resulting in equal-cost multipathing and faster OSPF adjacency restoration or NSF feature which is very helpful to avoid re-convergence during planned downtimes. We also skipped discussing some other features related to OSPF stability such as flooding reduction and LSA group pacing, that could yield performance benefits in networks with large LSDs. It is not possible to cover all relevant technologies in a single blog post, but you may refer to the further reading documents for more information. And finally, if you are planning to tune your IGP for fast convergence, make sure you understand all consequences. Modern routing platforms are capable of handling almost any "stormy" network condition without losing overall network stability, but pushing network to its limits could always be dangerous. Make sure you monitor your OSPF statistics for potentially high or unusual conditions after you performed tuning, or set maximum timers to more conservative values (e.g. 3-5 seconds) to provide additional safety.</p>
<h4>Further Reading</h4>
<p>The following is the minimum list of the publications suggested to read on the topic of fast IGP convergence.</p>
<p><a href=http://tools.ietf.org/html/draft-ietf-rtgwg-ordered-fib-01> [ORDERED-FIB]</a> "Loop-free convergence using oFIB"<br />
<a href=http://www.cs.princeton.edu/~jrex/teaching/spring2005/reading/shaikh01.pdf >[BLACKBOX-OSPF]</a> "Experience in Black-box OSPF Measurement”<br />
<a href=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.59.8483 >[SUBSEC-CONV]</a> “Achieving Sub-second IGP Convergence in Large IP Networks”<br />
<a href=http://www.cisco.com/en/US/docs/ios/12_0s/feature/guide/fasthelo.html >[OSPF-FASTHELLO]</a> "OSPF Fast Hello Enhancement"<br />
<a href=http://www.cisco.com/en/US/products/hw/routers/ps167/products_tech_note09186a008012fb87.shtml>[SPD]</a> "Understanding Selective Packet Discard"<br />
<a href=http://blog.ine.com/2009/12/31/tuning-ospf-performance/ >[TUNING-OSPF]</a> "Tuning OSPF Performance"<br />
<a href=http://tools.ietf.org/html/rfc5880 >[BFD]</a> "Bi-Directional Forwardin Detection"<br />
<a href=http://cisco.com/en/US/docs/ios/12_4t/12_4t15/ht_osmch.html >[OSPF-PREFIX-SUPPRESS]</a> "OSPF Prefix Suppression Feature"<br />
<a href=http://www.cisco.com/en/US/docs/solutions/Enterprise/Campus/routed-ex.html>[ROUTED-CAMPUS]</a> "Cisco fully routed campus design guidelines"<br />
<a href=http://blog.ine.com/2010/05/03/optimizing-ip-event-dampening/ >[OPT-DAMPENING]</a> "Optimized IP Event Dampening"<br />
<a href=http://www.itl.nist.gov/div897/sqg/dads/HTML/dijkstraalgo.html>[DIJKSTRA-SPF]</a> "Dijkstra SPF algorithm"</p>
<img src="http://feeds.feedburner.com/~r/ine/~4/Z4FMsChEGlA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://feedproxy.google.com/~r/ine/~3/Z4FMsChEGlA/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Traffic Engineering Challenge</title>
		<link>http://feedproxy.google.com/~r/ine/~3/7BWKqJB7kdE/</link>
		<comments>http://feedproxy.google.com/~r/ine/~3/7BWKqJB7kdE/#comments</comments>
		<pubDate>Fri, 14 May 2010 17:27:40 +0000</pubDate>
		<dc:creator>Petr Lapukhov, CCIE #16379</dc:creator>
				<category><![CDATA[CCIE General]]></category>
		<category><![CDATA[CCIE R&S]]></category>
		<category><![CDATA[ospf]]></category>
		<category><![CDATA[challenge]]></category>
		<category><![CDATA[traffic engineering]]></category>

		<guid isPermaLink="false">http://blog.ine.com/?p=3916</guid>
		<description><![CDATA[
			
				
			
		
Hi Everyone!
 The Challenge
People tend to underestimate the important of IGP routing features in modern network. So here is a small challenge scenario for you to practice OSPF traffic engineering. Take a look at the diagram below for information on the topology and link bandwidth. You may assume that every router has a loopback interface [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.ine.com%2F2010%2F05%2F14%2Ftraffic-engineering-challenge%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.ine.com%2F2010%2F05%2F14%2Ftraffic-engineering-challenge%2F&amp;source=inetraining&amp;style=normal" height="61" width="50" /><br />
			</a>
		</div>
<p>Hi Everyone!</p>
<p><b> The Challenge</b><br />
People tend to underestimate the important of IGP routing features in modern network. So here is a small challenge scenario for you to practice OSPF traffic engineering. Take a look at the diagram below for information on the topology and link bandwidth. You may assume that every router has a loopback interface for network testing and OSPF router-id selection.</p>
<p><a href="http://blog.ine.com/wp-content/uploads/2010/05/ospf-traffic-engineering.png"><img src="http://blog.ine.com/wp-content/uploads/2010/05/ospf-traffic-engineering.png" alt="ospf-traffic-engineering" title="ospf-traffic-engineering" width="430" height="308" class="aligncenter size-full wp-image-3917" /></a></p>
<p>There is a large cloud of media servers behind R4, and the users behind R1 need to use full 300Mbps of bandwidth when downloading files off the servers. The network is running single-area OSPF for IP routing. Ensure you can accomplish the above goal without using MPLS Traffic Engineering or Policy Based Routing. You are allowed to create additional logical interfaces, but the routing protocol, OSPF areas, physical links and their characteristics should remain unchanged. Keep the amount of changes to minimum and do not introduce new IP addresses.</p>
<p>The first person to provide a working solution will receive 100 rack rental tokens from our partner company <a href=http://gradedlabs.com>GradedLabs</a>. Please use your valid e-mail address when posting a comment, so we can locate your INE account.</p>
<p>UPD<br />
OK I forgot to rule out the &#8220;route-via&#8221; option <img src='http://blog.ine.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Try solving the task without relying on any &#8220;policy-based&#8221; routing decisions.</p>
<p>The winner is: Antonie Henning (<a href=http://21500.net>http://21500.net</a>). Ivan Pepelnjak helped  finding a logical &#8220;loophole&#8221; in my scenario by pointing to the &#8220;route-via&#8221; option available with GRE tunnels and correctly stating there should be 6 end-to-end tunnels to implement proper load-balancing. Hans Verkerk was close in his idea, but used static routing which was slightly against the rules and not as elegant as Antonie&#8217;s solution. Chris Stos-Gale and Nitzan Tzelniker came with the correct solution as well, but Antonie completed the challenge ahead of them. Thanks to everyone for participating in the challenge, it&#8217;s been fun! </p>
<p><b>The Solution:</b><br />
<span id="more-3916"></span></p>
<p>The problem is that there are three paths with varying minimum bandwidth values (50, 100 and 150, totaling to 300Mbps). Since OSPF does not support unequal-cost load-balancing, it is somewhat challenging to fully use the available bandwidth. There was a lot of ideas posted in the comments, and they mainly fall in three main categories:</p>
<p>1) Modify OSPF costs to create three equal cost paths from R4 to R1. This will result in slow (50Mbps) link oversaturation. Another variation was using three tunnel interfaces between R1/R4 with the same ECMP logic. This results in the same problem.<br />
2) Create six tunnels between R4 and R1 and configure the network so that 3 tunnels go across the fastest path, 2 tunnels take the medium path and one tunnel take the slowest path. This is somewhat similar to MPLS TE. To steer the tunnels you may use either static routes or the &#8220;route-via&#8221; option (Thanks to <a href=http://blog.ioshints.info/>Ivan Pepelnjak</a> to pointing me that!!). This solution would work, but violate the &#8220;updated&#8221; requirement not to use any &#8220;policy-based&#8221; routing decision, relying purely on OSPF path selection.<br />
3) The solution that I had on mind was splitting the links connecting R4 to it&#8217;s neighbors into &#8220;sub-channels&#8221; proportional to the bandwidth assigned to a given path:</p>
<p><a href="http://blog.ine.com/wp-content/uploads/2010/05/ospf-traffic-engineering-solution.png"><img src="http://blog.ine.com/wp-content/uploads/2010/05/ospf-traffic-engineering-solution.png" alt="ospf-traffic-engineering-solution" title="ospf-traffic-engineering-solution" width="430" height="308" class="aligncenter size-full wp-image-3923" /></a></p>
<p>The link labels represent OSPF costs. You only need to split the links at R4, as this is the &#8220;source&#8221; of the traffic flows. Link splitting could be done in two ways: using logical virtual circuits (e.g. FR PVCs or Ethernet VLANs/VCs) or by using IP tunnels. You will only need to run the IP tunnels between R4 and the directly attached routers, disabling OSPF on the physical link and enabling it on the tunnels.  Sample output at R4 for R1&#8217;s prefix:</p>
<pre>
R4#show ip route 10.0.1.1
Routing entry for 10.0.1.1/32
  Known via "ospf 1", distance 110, metric 4, type intra area
  Last update from 10.0.3.3 on Tunnel341, 00:46:50 ago
  Routing Descriptor Blocks:
  * 10.0.45.5, from 10.0.1.1, 00:46:50 ago, via Serial0/0/0.45
      Route metric is 4, traffic share count is 1
    10.0.3.3, from 10.0.1.1, 00:46:50 ago, via Tunnel340
      Route metric is 4, traffic share count is 1
    10.0.3.3, from 10.0.1.1, 00:46:50 ago, via Tunnel341
      Route metric is 4, traffic share count is 1
    10.0.1.1, from 10.0.1.1, 00:46:50 ago, via Tunnel142
      Route metric is 4, traffic share count is 1
    10.0.1.1, from 10.0.1.1, 00:46:50 ago, via Tunnel140
      Route metric is 4, traffic share count is 1
    10.0.1.1, from 10.0.1.1, 00:46:50 ago, via Tunnel141
      Route metric is 4, traffic share count is 1
</pre>
<p><b>Summary</b></p>
<p>I would like to thank everyone who participated in the &#8220;challenge&#8221;, I read all your responses but had to stop commenting when I found the right solution. I hope you enjoyed that little scenario as much as I did. Personally, I have some incline toward the &#8220;traditional&#8221; traffic engineering solutions based on pure IGP metric manipulation. Even though the solution presented does not scale in the real world, where you may resort to a different option (e.g. end-to-end route-via tunnels), it perfectly illustrates the little hacks you can do to a link-state IGP to break the default &#8220;ECMP paradigm&#8221;. </p>
<img src="http://feeds.feedburner.com/~r/ine/~4/7BWKqJB7kdE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://feedproxy.google.com/~r/ine/~3/7BWKqJB7kdE/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tuning OSPF Performance</title>
		<link>http://blog.internetworkexpert.com/2009/12/31/tuning-ospf-performance/</link>
		<comments>http://blog.internetworkexpert.com/2009/12/31/tuning-ospf-performance/#comments</comments>
		<pubDate>Fri, 01 Jan 2010 06:59:07 +0000</pubDate>
		<dc:creator>Petr Lapukhov, CCIE #16379</dc:creator>
				<category><![CDATA[IGP]]></category>
		<category><![CDATA[ccie]]></category>
		<category><![CDATA[ospf]]></category>
		<category><![CDATA[incremental spf]]></category>
		<category><![CDATA[ispf]]></category>
		<category><![CDATA[lsa group pacing]]></category>
		<category><![CDATA[lsa throttling]]></category>
		<category><![CDATA[spf throttling]]></category>

		<guid isPermaLink="false">http://blog.internetworkexpert.com/?p=3213</guid>
		<description><![CDATA[The latest, highly technical post from Petr on OSPF! ]]></description>
			<content:encoded><![CDATA[The latest, highly technical post from Petr on OSPF! ]]></content:encoded>
			<wfw:commentRss>http://blog.internetworkexpert.com/2009/12/31/tuning-ospf-performance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Router-ID Found!  It was here the whole time.</title>
		<link>http://blog.internetworkexpert.com/2009/09/17/router-id-found-it-was-here-the-whole-time/</link>
		<comments>http://blog.internetworkexpert.com/2009/09/17/router-id-found-it-was-here-the-whole-time/#comments</comments>
		<pubDate>Fri, 18 Sep 2009 02:56:03 +0000</pubDate>
		<dc:creator>Marvin Greenlee, CCIE #12237</dc:creator>
				<category><![CDATA[CCIE R&S]]></category>
		<category><![CDATA[IGP]]></category>
		<category><![CDATA[ccie]]></category>
		<category><![CDATA[ccie sp]]></category>
		<category><![CDATA[ospf]]></category>
		<category><![CDATA[strategy]]></category>
		<category><![CDATA[challenge]]></category>

		<guid isPermaLink="false">http://blog.internetworkexpert.com/?p=2056</guid>
		<description><![CDATA[Challenge solutions.  Comments / submitted solutions are now available on the original post.]]></description>
			<content:encoded><![CDATA[Challenge solutions.  Comments / submitted solutions are now available on the original post.]]></content:encoded>
			<wfw:commentRss>http://blog.internetworkexpert.com/2009/09/17/router-id-found-it-was-here-the-whole-time/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CCIE Lab Strategy – Task Dissection</title>
		<link>http://blog.internetworkexpert.com/2009/09/17/ccie-lab-strategy-task-dissection/</link>
		<comments>http://blog.internetworkexpert.com/2009/09/17/ccie-lab-strategy-task-dissection/#comments</comments>
		<pubDate>Thu, 17 Sep 2009 11:17:01 +0000</pubDate>
		<dc:creator>Marvin Greenlee, CCIE #12237</dc:creator>
				<category><![CDATA[CCIE R&S]]></category>
		<category><![CDATA[IGP]]></category>
		<category><![CDATA[ccie]]></category>
		<category><![CDATA[ccie sp]]></category>
		<category><![CDATA[ospf]]></category>
		<category><![CDATA[strategy]]></category>

		<guid isPermaLink="false">http://blog.internetworkexpert.com/?p=2043</guid>
		<description><![CDATA[Followup to "Have you seen my Router ID?", dissecting a task from a strategy / knowledge point of view.]]></description>
			<content:encoded><![CDATA[Followup to "Have you seen my Router ID?", dissecting a task from a strategy / knowledge point of view.]]></content:encoded>
			<wfw:commentRss>http://blog.internetworkexpert.com/2009/09/17/ccie-lab-strategy-task-dissection/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Have you seen my Router ID?</title>
		<link>http://blog.internetworkexpert.com/2009/09/16/have-you-seen-my-router-id/</link>
		<comments>http://blog.internetworkexpert.com/2009/09/16/have-you-seen-my-router-id/#comments</comments>
		<pubDate>Wed, 16 Sep 2009 19:50:13 +0000</pubDate>
		<dc:creator>Marvin Greenlee, CCIE #12237</dc:creator>
				<category><![CDATA[CCIE R&S]]></category>
		<category><![CDATA[IGP]]></category>
		<category><![CDATA[ccie]]></category>
		<category><![CDATA[ccie sp]]></category>
		<category><![CDATA[ospf]]></category>
		<category><![CDATA[challenge]]></category>
		<category><![CDATA[Router ID]]></category>

		<guid isPermaLink="false">http://blog.internetworkexpert.com/?p=2011</guid>
		<description><![CDATA[Challenge post on OSPF configuration.]]></description>
			<content:encoded><![CDATA[Challenge post on OSPF configuration.]]></content:encoded>
			<wfw:commentRss>http://blog.internetworkexpert.com/2009/09/16/have-you-seen-my-router-id/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Understanding OSPF Transit Capability</title>
		<link>http://blog.internetworkexpert.com/2009/09/14/understanding-ospf-transit-capability/</link>
		<comments>http://blog.internetworkexpert.com/2009/09/14/understanding-ospf-transit-capability/#comments</comments>
		<pubDate>Tue, 15 Sep 2009 06:03:51 +0000</pubDate>
		<dc:creator>Petr Lapukhov, CCIE #16379</dc:creator>
				<category><![CDATA[IGP]]></category>
		<category><![CDATA[ccie]]></category>
		<category><![CDATA[ospf]]></category>
		<category><![CDATA[rfc2328]]></category>
		<category><![CDATA[transit capability]]></category>

		<guid isPermaLink="false">http://blog.internetworkexpert.com/?p=1989</guid>
		<description><![CDATA[The feature we are going to talk about today may look a bit convoluted, but it demonstrates core OSPF behavior: combining link-state and distance-vector behaviors. The command capability transit was introduced in IOS 12.3T and is on by default. However, the description is rather confusing and does not explain the underlying mechanics. We are going [...]]]></description>
			<content:encoded><![CDATA[The feature we are going to talk about today may look a bit convoluted, but it demonstrates core OSPF behavior: combining link-state and distance-vector behaviors. The command capability transit was introduced in IOS 12.3T and is on by default. However, the description is rather confusing and does not explain the underlying mechanics. We are going [...]]]></content:encoded>
			<wfw:commentRss>http://blog.internetworkexpert.com/2009/09/14/understanding-ospf-transit-capability/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>OSPF Route Filtering Demystified</title>
		<link>http://blog.internetworkexpert.com/2009/08/17/ospf-route-filtering-demystified/</link>
		<comments>http://blog.internetworkexpert.com/2009/08/17/ospf-route-filtering-demystified/#comments</comments>
		<pubDate>Mon, 17 Aug 2009 23:38:09 +0000</pubDate>
		<dc:creator>Petr Lapukhov, CCIE #16379</dc:creator>
				<category><![CDATA[IGP]]></category>
		<category><![CDATA[ccie]]></category>
		<category><![CDATA[ospf]]></category>
		<category><![CDATA[filtering]]></category>
		<category><![CDATA[lsa]]></category>

		<guid isPermaLink="false">http://blog.internetworkexpert.com/?p=1698</guid>
		<description><![CDATA[Intro
There was a lot of blogging related to OSPF topics recently. In this post, I would like to clarify some common misunderstandings that many people have about OSPF route filtering. I have seen so many folks (some of them really experienced persons!) incorrectly understanding the underlying behavior so it&#8217;s about time to make this clear. [...]]]></description>
			<content:encoded><![CDATA[Intro
There was a lot of blogging related to OSPF topics recently. In this post, I would like to clarify some common misunderstandings that many people have about OSPF route filtering. I have seen so many folks (some of them really experienced persons!) incorrectly understanding the underlying behavior so it&#8217;s about time to make this clear. [...]]]></content:encoded>
			<wfw:commentRss>http://blog.internetworkexpert.com/2009/08/17/ospf-route-filtering-demystified/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
