Friday, January 29, 2010

BGP AS-Override & MPLS

I received a comment on my “The Case for Performance Routing” blog post regarding BGP AS-override and MPLS:

… I am in a similar situation where I am being asked to request the carrier to do an AS-override however I am not comfortable with allowing the carrier to break standard BGP loop prevention. I was wondering if you can possibly list some of the reasons why you think assigning individual AS numbers per site was the right choice even though it seems you had to change it for your CORE. Another question was did you assign individual AS numbers for all your Branch sites or only for the large locations. I am considering assigning an AS per location no matter how large or small.

As a general rule, I prefer not to use BGP AS-Override.  I suspect the commenter’s reservations have to do with this not ‘feeling’ right.  Just as I would not implement NAT on my internal network without a good reason, I feel it is a poor design choice to re-use AS numbers unless the scale of the network dictates it.  BGP AS numbers 64512 – 65534 (inclusive) are defined as Private Autonomous Systems.  These can be used by any organization internally, but cannot be leaked to the Internet.  Consider them as the BGP AS equivalent of RFC 1918 IP addresses.  This gives us 1023 Private AS numbers to work with.

It is tempting to say that you should use individually-assigned Private AS numbers for network with fewer than 1000 locations, but to accommodate future growth, I would set the upper threshold around 500 – 600 locations.  This would allow for a doubling of the current location account before running out of AS numbers.  If the network I was engineering exceeded this size, AS-Override would be the way to go.

Let’s suppose that the size of the network dictates the use of AS-Override.  How would I recommend the AS numbers be assigned?  The core site(s) should receive globally-unique assignments.  This facilitates interconnecting those locations.  To clarify a point from my design, I logically combined my three core locations into a single BGP AS.  From the perspective of the WAN, these three physical locations are only a single network location. I am comfortable with this decision because the sites are interconnected with redundant links, so multiple failures are necessary to ‘break’ this AS.

There were several reasons that we decided to treat the core locations as a single AS.  The primary reason was the need for fast failover over the redundant paths that interconnect these three sites.  BGP can be optimized to fail over within a few seconds, but my core needs better performance than that protocol provides.  We chose to use a tuned OSPF implementation for our core interconnects.  This was also done with an eye towards future needs, such as layer 2 LAN extension between the core data centers.  A second reason was to prepare the WAN for Performance Routing.  Our PfR implementation requires that all three locations utilize the same AS number, as we use BGP to inject the more specific routes that PfR chooses as the best path.

As for the remote locations with BGP AS-Override… I recommend that you utilize the same AS number for every remote location.  This allows for simpler troubleshooting and configuration.  Once the decision is made to re-use AS numbers, there is no value to making some of them unique.  You may as well keep the vast majority of your Private AS numbers in your back pocket for other uses.  If this doesn’t sit well with you, perhaps it would make sense to make all similar locations use the same AS number.  For example, if you have three remote location designs, assign AS 65001 to the ‘Small Sites’, AS 65002 to the ‘Medium Sites’ and AS 65003 to the ‘Large Sites’.

As a last comment on large MPLS networks, I would hesitate before creating a single MPLS network with more than a few hundred locations.  There are several reasons for this recommendation:

  1. Head-End Bandwidth – 200 remote locations utilizing simple T1s would require a 100mb head-end circuit for 3-1 oversubscription, or an OC-3 for 2-1 oversubscription.  Depending on traffic expectations, you could certainly choose to do greater oversubscription.  The key here is to know your network.
  2. Fate-Sharing – If you have only a single head-end circuit, all 200+ remote locations will go down at the same time.  That’s a lot of trouble tickets and support calls.  If you have two head-end circuits/locations, things will be better, but traffic will still failover at the same time, which could cause utilization issues.  Even if there is a backup path via another MPLS provider for each location, the sheer number of changes in traffic pattern could be painful.
  3. Fault Isolation / Virus Containment – One of the greatest advantages of MPLS is the any-to-any connectivity it provides.  In most cases, this is a vast improvement over the hub-and-spoke networks that were common with Frame-Relay, ATM and point-to-point circuits.  The one drawback of this capability is that it becomes significantly more difficult to isolate a portion of the network.  An SQL Slammer-style worm in any any-to-any environment could prove to be very difficult to contain.

My employer’s network is contains approximately 100 locations, which are sub-divided into multiple regions.  While I have worked for networks that contained thousands of locations, they were also subdivided by business and/or geographic location.  Therefore, my advice above is not based on real-world experience.  So I suppose it is worth what you’ve paid for it, or maybe slightly more.  If anyone has direct experience with very large MPLS networks, in the 500+ range, I would love to hear whether my concerns above are valid, or if there are other factors that outweigh them.

Jeremy