Monday, August 3, 2009

When Administrative Distance Doesn’t Count

In troubleshooting a network issue, I came across an interesting situation where EIGRP and OSPF metrics are directly compared. Obviously, this is a flawed design, and it has been updated to fix the issue. I thought this might be a good opportunity to share the experience.

Problem
Traffic is not taking the optimal path from RouterE to 10.34.0.0/16. Here is the topology:


Traffic from RouterE to 10.34.0.0/16 should go to RouterD, then to RouterY, and then to 10.34.0.0/16. What we're actually seeing is this traffic take the less-optimal path of RouterE->RouterC->RouterD->RouterY.

Cause
Here is the output of 'show ip route 10.34.0.0' on RouterE. We should see the next hop as "10.0.3.1, from 10.0.1.11", but here is what we see instead:
RouterE#show ip route 10.34.0.0
Routing entry for 10.34.0.0/16
Known via "bgp 65000", distance 200, metric 4294967294
Tag 65001, type internal
Last update from 10.0.4.1 2w1d ago
Routing Descriptor Blocks:
* 10.0.4.1, from 10.0.1.20, 2w1d ago Route metric is 4294967294, traffic share count is 1
AS Hops 1
Route tag 65001

So the question is, why would RouterE prefer to go all the way to RouterA to get to the 10.34.0.0/16, when there's an equally good (and closer!) path to the network off of RouterD. To find the answer, we need to look at the BGP table for 10.34.0.0/16. Here's that output:
RouterE#show ip bgp 10.34.0.0/16 BGP routing table entry for 10.34.0.0/16, version 107467
Paths: (2 available, best #1, table Default-IP-Routing-Table)
Advertised to update-groups:
2
65001, (aggregated by 65001 10.1.34.1), (received & used)
10.0.4.1 (metric 13) from 10.0.1.20 (10.0.1.20)
Origin IGP, metric 4294967295, localpref 100, valid, internal, atomic-aggregate, best
65001, (aggregated by 65001 10.1.34.2), (received & used)
10.0.3.1 (metric 26112) from 10.0.1.11 (10.0.1.11)
Origin IGP, metric 4294967295, localpref 100, valid, internal, atomic-aggregate

I highlighted the key information, which is the IGP metric to reach the BGP next hop. This is the eighth point of comparison for BGP routes, but the seven more important comparison points are all equal (Click here for BGP Path Selection Process). Notice that 10.0.4.1 is only metric=13 away, while 10.0.3.1 is metric=26112 away. Where do these values come from? Here is that output:
RouterE#show ip route 10.0.4.1
Routing entry for 10.0.4.0/31
Known via "ospf 80", distance 110, metric 13, type intra area
Redistributing via eigrp 80
Last update from 10.0.0.37 on GigabitEthernet1/1, 1w3d ago
Routing Descriptor Blocks:
* 10.0.0.37, from 10.0.1.20, 1w3d ago, via GigabitEthernet1/1
Route metric is 13, traffic share count is 1

RouterE#show ip route 10.0.3.1 Routing entry for 10.0.3.0/31
Known via "eigrp 80", distance 90, metric 26112, type internal
Redistributing via eigrp 80
Last update from 10.0.0.40 on Vlan50, 1w5d ago
Routing Descriptor Blocks:
* 10.0.0.40, from 10.0.0.40, 1w5d ago, via Vlan50
Route metric is 26112, traffic share count is 1
Total delay is 20 microseconds, minimum bandwidth is 100000 Kbit
Reliability 255/255, minimum MTU 1500 bytes
Loading 1/255, Hops 1

The route to 10.0.4.1 is learned via OSPF, while the route to 10.0.3.1 is learned via EIGRP. In this environment, OSPF uses fairly small metrics (LAN links have a cost of 1, WAN links have a cost of 10). We use the default EIGRP metrics, which are much larger numbers. Under nearly all other circumstances, IGP metrics are not compared between different routing protocols. This is one of those 'special cases'. Normally different routing protocols are compared based on Administrative Distance, and the routing protocol with the lower Admin Distance is preferred, with no regard to metric values.

Solution
To solve this issue, I removed the RouterD<->RouterE link from EIGRP routing, but more importantly, I also removed all AS65000 network links from the EIGRP topology table. This prevents RouterE from learning the 10.0.3.0/31 route via EIGRP. With both 10.0.3.0/31 and 10.0.4.0/31 learned via OSPF, the metric comparison makes sense, so the path is properly chosen. Here is a diagram of the new environment (notice the subtle movement of the EIGRP 80 domain to exclude the RouterD<->RouterE link.. it's the only change):

And here is the show command output confirming the solution worked:
RouterE#show ip route 10.34.0.0 255.255.0.0 Routing entry for 10.34.0.0/16
  Known via "bgp 65000", distance 200, metric 4294967294
  Tag 65001, type internal
  Last update from 10.0.3.1 00:02:17 ago
  Routing Descriptor Blocks:
  * 10.0.3.1, from 10.0.1.11, 00:02:17 ago       Route metric is 4294967294, traffic share count is 1
      AS Hops 1
      Route tag 65001

RouterE#show ip bgp 10.34.0.0/16
BGP routing table entry for 10.34.0.0/16, version 118687
Paths: (2 available, best #2, table Default-IP-Routing-Table)
  Advertised to update-groups:
     2        
  65001, (aggregated by 65001 10.1.34.1), (received & used)
    10.0.4.1 (metric 13) from 10.0.1.20 (10.0.1.20)
      Origin IGP, metric 4294967295, localpref 100, valid, internal, atomic-aggregate
  65001, (aggregated by 65001 10.1.34.2), (received & used)
    10.0.3.1 (metric 2) from 10.0.1.11 (10.0.1.11)
      Origin IGP, metric 4294967295, localpref 100, valid, internal, atomic-aggregate, best

RouterE#show ip route 10.0.3.1 Routing entry for 10.0.3.0/31
  Known via "ospf 80", distance 110, metric 2, type intra area
  Redistributing via eigrp 80
  Last update from 10.0.0.40 on Vlan50, 00:02:34 ago
  Routing Descriptor Blocks:
  * 10.0.0.40, from 10.0.1.11, 00:02:34 ago, via Vlan50
      Route metric is 2, traffic share count is 1

RouterE#show ip route 10.0.4.1
Routing entry for 10.0.4.0/31
  Known via "ospf 80", distance 110, metric 13, type intra area
  Redistributing via eigrp 80
  Last update from 10.0.0.37 on GigabitEthernet1/1, 2w0d ago
  Routing Descriptor Blocks:
  * 10.0.0.37, from 10.0.1.20, 2w0d ago, via GigabitEthernet1/1
      Route metric is 13, traffic share count is 1



Summary

In the grand scheme of things, a single additional gigabit Ethernet hop is inconsequential. So why does this matter enough to open a change request and wake up at 6am on a Sunday to fix? For me, it’s a sense of completeness. There was something ‘not right’ about the way things were working, and fixing it pushes the needle a little closer to ‘done’. I equate this to a love of symmetry. For some people, it’s important for things to be balanced. When I take two eggs out of the carton, I take one from each row. It isn’t more correct, but it makes me feel better. Same goes for the network. If it is working 100% optimally, then any small issue is readily apparent. Terry Slattery said something similar in his blog entry on network hygiene a few weeks ago:
If you don't keep things clean, interactions between otherwise minor problems can create a larger problem
He’s absolutely right.. eliminating the small issues makes the big ones far less complicated.

No comments: