Monday, October 19, 2009

Cost Estimate for the CCDE

Yesterday I received an interesting comment to one of my previous CCDE blog posts.  Jack asked:

My employer is asking for cost estimates for the whole process [CCDE]. I have seen that your estimate is 3000 dollars. Does this include the tests themselves or just the materials?

I contend that the real cost of this certification is in the time required to master the material.  That said, the actual out of pocket costs are very important too!  Here is a breakdown.

 

Costs

Fixed Costs

Cost Item
$350 (each attempt) Written Exam (352-001)
$1400 (each attempt) Practical Exam (352-011)

Variable Costs

Cost Item
Minimal Travel to Written Exam
Potentially Expensive Travel to Practical Exam
~$400 Books
Unknown Formal Training

You almost must factor in the opportunity costs of your time, as many of aspiring candidates need to delay or avoid other time commitments.  At the very least, you’ll likely miss some family and friend events to concentrate on studying.

 

Travel to Written Exam

Hopefully this is minimal for everyone.  The exam is offered at all Vue testing centers, so this should be a short car or bus ride away.

 

Travel to Practical Exam

Currently, the practical exam is offered in Hong Kong, London, and Chicago.  Historically it’s been offered every six months, but that pace is expected to pick up to meet increased demand for the exam.  As of this writing, the exam was most recently offered in August, and the next scheduled offering is in December.  I expect the practical to be offered quarterly beginning in 2010.

Travel to/from these selected testing locations is not trivial.  My three day / two night trip to Chicago (from Delaware) cost approximately $1100.  That probably comes in at the low end of most candidates travel costs, as my airfare was only $350.  International travel can be considerably more expensive.  Of course, if you’re fortunate enough to be local to one of these cities, you’ll save a bundle!  Also keep in mind that it is probably wise to budget for two trips, to prepare for the possibility of failure on the first attempt.  This is an area where I wouldn’t cut corners.  For a big exam (or business meeting, etc), be sure to arrive in plenty of time (early afternoon on the prior day for me) and don’t be rushed to leave.  The exam is enough to think about, without adding travel concerns to the mix.

Books

I’ve given my book recommendations before, but I’ll repeat them here:

Optimal Routing Design – One of the authors (Russ White) is also author of most of the CCDE Practical content to date.  It helps to get into the mind of the test writer!  Besides that, this book is the best resource for Enterprise routing design, and specifically, IGP design.

Definitive MPLS Designs – This book covers MPLS Service Provider designs, as well as a bit of Internet Service Provider design.  More than any other resource, this book gives a great feel for the style of the CCDE practical exam.

BGP Design & Implementation – This book focuses almost exclusively on Internet Service Provider design.  It gives the “Why” behind route-reflectors and confederations.

End-to-End QoS Network Design – This book does for QoS what the above three books do for IP Routing & MPLS.  It uses great examples to describe class-based traffic-shaping and other QoS concepts.  For the purposes of this exam, you can disregard the configurations.

These four books cover the basics of network design.  In addition to these resources, candidates may also want to pick up Network Management Fundamentals.  This book gives an overview of Network Management design, and a few specifics on Network Management technologies.

Note that these books don’t necessarily cover the basic technology particularly well.  Each of these books has a primer section, but none of them go into the depth necessary to teach someone new to the respective technologies.  If you need more basic information on the technologies, I would suggest following the reading suggestions available from the appropriate CCIE track.  For example, if you need information on VPLS or AToM, read Layer 2 VPN Architectures.

Formal Training

 

I’ve noticed that there are now at least two formal training classes available for the CCDE.  One is being offered in Europe, and is taught by an actual CCDE certified instructor.  The second is offered in the US.  I suspect this is not being taught by a CCDE.  I’m not sure what to think of formal training for this exam.  It likely depends heavily on how the course is structured.  I do not believe that one or two weeks of classroom training can prepare a candidate for the broad range of technical topics covered in the blueprint.

On the other hand, a week of classroom training could be great for learning design methodology.  Assuming the student already has a fundamental grasp of most of the technology, an instructor-led class that teaches by example could be a positive addition to a candidates preparation plan.  It may not even take a full week to prepare a qualified candidate for the style of the exam.

I will be interested to hear feedback and test results from the first wave of candidates who partake in these classroom training offerings.  Before I hear firsthand whether the candidates felt prepared, I won’t be able to recommend this strategy.

 

Conclusion

 

To sum this up (literally!), I spent about $3000 in actual dollars to achieve the CCDE certification (note, I took the written during the beta period, so my numbers don’t quite add up).  I am fortunate to have a supportive employer who covered most of these costs.  A candidate can spend far more than this in their quest for the CCDE certification.  If I were coming at this today, I would budget for two attempts for the written exam ($700), two attempts for the practical ($2800) + travel for the practical ($2200 for me, but varies wildly depending on your location).  I would also budget $400 for books, and depending on student feedback, I might budget for a formal class.  Without the class, my total budget would be $6100.

Monday, October 12, 2009

Thoughts on Fiber Channel over Ethernet (FCoE)

With all the recent Twitter chatter on the topic, I feel compelled to throw in my two cents on Fiber Channel over Ethernet.  Here goes!

FCoE Selling Point
Traditionally, data center designers built separate Ethernet-based Data Networks (LANs) and FC-based Storage Networks (SANs).  This was necessary because storage has a requirement for lossless operation, and Ethernet did not have the ability to meet this requirement.  This requirement cost IT departments considerable extra money in fiber cabling and FC adapters in servers.
Data Center Bridging (DCB) - which IBM calls Converged Enhanced Ethernet (CEE) and Cisco has referred to as Data Center Ethernet (DCE) - adds lossless operation to the Ethernet standard.  This allows us to multiplex storage traffic onto our data networks.  Voila, we can save a relative fortune in cabling costs and dedicated FC equipment, and to a lesser degree, we can save on server network adapters by using Converged Network Adapters (CNA).

Where Are Things Going?
This is great… Now we’re saving a bunch of money, and we still have the same basic network.  But is this the best we can do?  Of course not!  Why do we want the same basic network?  Why don’t we put our storage traffic into IP packets and zip it along like everything else?  Then we won’t need a separate network at all, virtual or otherwise.
If this story sounds familiar, that’s because it is.  At one time, SNA traffic had a dedicated network.  Then we decided to encapsulate it in a layer 2 protocol via RSRB, SR/TLB, etc.  Eventually, we tossed the traffic into IP packets via DLSW+.  Ultimately, we put IP-capable adapters in our mainframes and dispensed with the legacy technologies.
Or maybe you're thinking about voice… Who remembers the MC3810?  That was my first introduction to Voice over X technology.  We plugged PBX T1s into a router and encapsulated voice into Frame-Relay or ATM.  There’s even a parallel here with the FCoE multi-hop controversy.  Later we encapsulated the T1 traffic into IP packets on a router.  Eventually, we put IP-capable adapters into our PBXs and dispensed with the legacy technologies.
As far as storage goes, we’re still stuck on step 2, encapsulating storage into a layer 2 protocol.  Meanwhile, there are plenty of Storage over IP options available that don’t quite meet our performance needs.  Does anyone actually want to bet that NAS or iSCSI is never going to meet our performance needs?  Sure, there will always be specific high performance computer (HPC) needs that push the envelope, but the vast majority of corporate needs will eventually be met by IP-based storage.

So What Should We Do?
Does this mean we should ignore FCoE?  Definitely not.  I am in no hurry to swap out any existing FC-based SANs for FCoE.  I don’t see the financial justification for it, since I’ve already sunk my money into the cabling.  If I were to build a new data center, and I could demonstrate that IP-based storage would not meet my performance need, I would absolutely go for an FCoE solution.  But I would also spend a lot of time determining if I truly needed the extra performance offered by a SAN.  Any new DC build is going to be based on 10gbit Ethernet, so that should factor into the decision.
Ultimately, Fiber Channel will go away, just like every other dedicated network technology before it.  During the transition phase, which we’re currently in, it often makes financial sense to go with the interim technology.  I can’t come up with a scenario where it would make sense to replace an existing, working Fiber Channel network, but new builds should seriously consider going with FCoE.

Monday, September 28, 2009

Troubleshooting an Application Problem

I grew up knowing I would be a computer programmer… then I reached my University years and realized I had no passion for it.  I still pursued and attained my Computer Science degree, but I gravitated towards non-programming courses like Computer Networks, System Architecture and Telecommunication Systems.  Prior to losing my interest, I was quite good at it, and I believe my familiarity with the subject helps me to understand why Cisco IOS and other network OSs work the way they do.  My programming background also leads me to offer opinions on application development.

My first consulting project for RPM Consulting was for Philadelphia Newspapers, the owner of the Philadelphia Inquirer and Daily News papers.  Most of my efforts were focused on a new LAN design for their downtown office, which faced the familiar constraints of old, big city office buildings, like limited conduit space and union labor for all physical cable moves.  While this was interesting, I was drawn to the secondary project:  troubleshooting a new Unisys-provided newswire service application.

This custom-built application was intended to provide a centralized, searchable database of all news wire stories.  In the Unisys lab, and then again in the Inquirer lab, it worked perfectly.  Users could access story titles and abstracts, and if they were intrigued, a single click would retrieve the content.  When the system was deployed to the first batch of users, it failed miserably.  Downloading the story list was quick, but it took several minutes to download each individual story.  The application developer blamed the network (big surprise).  A colleague and I were able to borrow RPM’s Network General lunchbox sniffer for a day to get the data I needed to troubleshoot.  Our packet traces revealed that the application was built using UDP, not TCP, as I was expecting.

As our studies have taught us, TCP has built-in reliability, in the form of sequence numbers, acknowledgements and retransmissions.  UDP provides none of these features.  Application programmers who choose to use UDP must provide their own reliability mechanisms.  In this case, the developer coded the client application to send an acknowledgement once the full story was received.  If the story was not received in a predetermined time period (approximately two minutes), the client application repeated its request.  If a single UDP packet was dropped,this timeout value was the only mechanism for detecting the event.  This was fine for a lossless lab network, but in a production network, packets are sometimes dropped.  Especially in this particular network, which was in the midst of a Token-Ring to Ethernet conversion.

Due to the ongoing conversion project, there was very little I could do to mitigate the packet loss in the short term.  Our recommendation was for the application developer to rewrite the application to use TCP.  Fortunately, my colleague was a former Unisys employee, so that task fell to him.  The interim solution was to dial down the timeout value from two minutes to a few seconds.  I’m not certain either solution was accepted, as my participation in that consulting assignment ended before a solution to the application problem was implemented.

Monday, September 21, 2009

Defining a Quality of Service Policy

Let’s get this out of the way in the beginning.  Quality of Service is my favorite part of network design.  The fact that we now have enough power within our network devices to inspect packets and make on-the-fly DSCP value changes is amazing.  It doesn’t seem that long since we had to optimize our ACLs to keep the router CPU under 99%.  But remember:  With power comes responsibility.

Quality of service often gets a bad reputation for a pair of reasons:

  1. It requires micromanagement of applications
  2. It allows (or forces) network administrators to play favorites

Some would say it also gets a bad rep because it is often poorly implemented, but that wouldn’t happen to us, right? :)

Both of these are certainly possibilities, but a well-designed QoS policy doesn’t require either to be true.

 

My Current QoS Policy

I designed and led the implementation of a ten-class QoS policy for my organization.  Ten classes?!  I’m sure this seems like overkill, but there is a good reason for each class.  Here are the classes, plus the DSCP values associated with them:

Class Name DSCP Value(s) Traffic Type / Example
Internetwork-Control CS6 Routing Protocols
Telephony EF, CS3 VoIP, VoIP Signaling
Video-Conference AF41, AF42, AF43 Realtime Video Conferencing
Video-Live-Streaming AF31, AF32, AF33 Live Webcast
Video-Backbone CS4 Realtime Video Transfers
Transactional-Data AF21, AF22, AF23 Interactive Data
OAM CS2 Network Mgmt (SNMP, TACACS)
Bulk-Data AF11, AF12, AF13 Data Transfer Apps (FTP, CIFS)
Standard Default (DSCP=0) Uncategorized Traffic
Scavenger CS1 Undesired / Out of Spec Traffic

A bit of clarification around the three video classes is necessary.  We have two important video-based applications.  We have deployed a video-conferencing solution to a significant number of our locations.  This traffic is marked as AF41.  We also have scheduled live webcasts that consist of two traffic types.  The obvious one is the actual live video stream, which is delivered via multicast over our WAN to all remote locations.  This traffic is marked AF31.  The second type is the source material that creates the webcast.  These events often require remote camera feeds from other internal locations.  Originally, we would ‘roll a truck’ and use a satellite uplink to get the feed back to our HQ office.  Now, we attach the camera(s) to an encoder and convert the audio/video into a UDP packet stream.  This is then sent to our HQ site for A/V mixing to create the live stream.  It’s a great solution, and it has paid for itself many times over already (satellite trucks are expensive!).  The pre-work is significant on the network side, as we need to temporarily modify our QoS policy to accommodate the new stream, but it is worth the effort.

Let’s dedicate a paragraph to each of the classes to give an idea of why they exist.  These are loosely ordered based on importance, but as you will see, everything gets a bandwidth guarantee:

Internetwork Control

This class includes all of our routing protocol traffic, as well as the Lightweight Access Point (LWAPP) to Wireless LAN Controller (WLC) signaling traffic.  These are generally light traffic loads, so the queuing allocation can be small.

Telephony

We have chosen to combine voice and voice signaling traffic into a single traffic class.  I’ve even instructed our voice team to mark signaling traffic as EF.  We then overprovision the priority queue by 15% or so to allow for the extra traffic.

Video-Conference

Our video-conference equipment marks all stream traffic as AF41.  This traffic is RTP, and the application is very sensitive to dropped packets.  Because this application is real-time, it is also sensitive to delay and jitter.  While Cisco advises network admins to use a priority queue for their Telepresence system, I’ve been able to get by with standard Bandwidth allocation, which preserves our priority-queuing capabilities for the Telephony traffic.

Video-Live-Streaming

As mentioned above, the live video stream is delivered via multicast, using RTP packets.  Much like our Video-Conference traffic, this is highly sensitive to dropped packets.  Jitter and delay are less of a concern, due to the one-way nature of this stream.  It’s like watching TV… As long as the signal is solid, you don’t really care if it is delayed by a second or two.

Video-Backbone

This is very much like Video-Live-Streaming, except it flows in the opposite direction.  It is also a much higher-bandwidth stream.  We gave some thought to combining this with the Video-Live-Streaming class, but in the end we decided it was much clearer to give it a unique identifier.

Transactional-Data

This is the first of three end user application classes.  We mark all interactive applications with AF21.  I define interactive as a command/response relationship, like Telnet or Remote Desktop.  The user types or clicks something, then waits for a response before entering the next command.  Most (but not all) of these applications are low bandwidth.  They are generally delay sensitive, as the user is constantly monitoring the application.  This is where we place some of our internally-developed applications.

OAM

This queue is used for network management traffic.  Applications such as Netflow, SNMP and TACACS+ are placed in this queue.  Historically it has been difficult to get this traffic properly marked, as it usually sources from a router/switch, but this has gotten better with Cisco’s movement towards dedicated management ports on equipment.

Bulk-Data

This is the second end user application class.  All data transfer applications go into this queue.  These applications will often take as much bandwidth as is available.  They are not terribly sensitive to delay or jitter.  We further characterize these apps as time sensitive (AF11) and time insensitive (AF13).  This allows us to use a WRED policy to selectively drop packets from less critical transfers.  The time-insensitive category is great for NetApp SnapMirror, Windows background updates, patch deployments, etc.  This gives priority to user-based apps like Windows File Sharing.

Default

Anything that isn’t specifically marked into another queue falls into the Default queue.  Most HTTP-based apps are here.  The majority of my organization’s traffic falls into this queue.  It receives a healthy allocation of interface bandwidth as a result.

Scavenger

The Scavenger class is reserved for traffic that either exceeds normal network patterns or is known to be unnecessary.  We have defined a simple end user port policy of allowing any user to send up to 5mb/s of traffic into the network.  Any traffic which exceeds this value is marked as CS1.  This promotes a level of fairness between users.  This queue is given a 1% bandwidth allocation at all network chokepoints, so it is the first to go when there is congestion.  As for the ‘known unnecessary’ category, I’d prefer get rid of the application, rather than penalize the traffic.

 

What’s Missing?

Notice there isn’t a ‘Priority Applications’ queue in this model.  We purposely defined our queues based on traffic characteristics, not application priorities.  This has served us well, and has prevented the cajoling many experience when they ask their users to define ‘important’ applications.

Will this model work for everyone?  Probably not.  Every organization has a unique set of applications.  If you have already agreed to prioritize something, it can be difficult to back out of that promise.  If you haven’t done so, I suggest you avoid heading down that path.  Think hard about the characteristics of you traffic, and attempt to

In a future blog post I will show how this policy translates to a packet marking configuration, and later, an interface queuing policy.  I am a proponent of MPLS networks, which introduce another wrinkle to the equation.  None of my incumbent providers offers a ten class QoS scheme.  I will show what we’ve done to adapt our policy to those constraints.

Monday, September 14, 2009

Time for IPv6?

It is Fiscal Year 2010 budget time at my company.  We have just completed a rather expensive three-year cycle of hardware upgrades, so it is refreshing to look at the modest capital costs I am requesting for 2010.  Of course, depreciation is eating up a significant portion of our budget for the next couple of years, so I would be hesitant to ask for a large allocation this year anyway.

Every year since I started with my current employer, my potential project list begins with “Implement IPv6”.  And every year, I quickly move it to the following year’s potential project list.  While I pondered it a split-second longer this time around, I still moved it to the 2011 list.  I see no compelling reason to foist a new protocol on my employer over the next 12 – 16 months.

Please understand, I’m not dense when it comes to the need for IPv6 in the world.  I am able to comprehend IPv4 address exhaustion (I read Geoff Huston’s excellent blog at potaroo.net), and I see the benefits to IPv6.  In 2000, I successfully argued that it was in my (then) employer’s best interest to pay for my purchase of an IPv6 book, so I could digest the information and teach my consulting co-workers how to plan for the protocol.  “It can be a differentiator!” I explained.  Is there a more perfect project for the PDIM cycle? (Plan, Design, Implement, Manage, or feel free to substitute your own consulting methodology)  Much of the information in that book became obsolete, and I’m fairly certain I recycled it years ago.

So why not this year?  In brief, one of the following things must happen for me to push forward with an IPv6 project:

- A compelling application is released that requires or would benefit from IPv6.  Definitely not a reality yet.  I’m holding out hope that VMWare will realize that VMotion and IPv6 Mobile IP are a nice match.  So far, I haven’t heard anything.  (I really want to spend some time developing my thoughts on this into a coherent blog post)

- Our customers start clamoring for it.  We expect this to happen in our ASPAC businesses eventually, but nothing yet.  Maybe I’m naive, but I am confident that IPv6 users will have gateways into the IPv4 world.  After all, who would be willing to sign up with an ISP that can’t reach the ‘real’ Internet?  Even the then-mighty AOL had to abandon that business model is the 90s.

- Our business partners require it.  We’re in a regulated industry, so there is a fair amount of government and quasi-government involvement.  I assume they will be the first business partners to move to IPv6, but I have even heard of any inquiries about out IPv6 status.

To generalize, we’re not a bleeding edge infrastructure company.  We would not derive any benefit from being a first mover.  I like new technology as much as the next guy, but I have a responsibility to make logical, defensible recommendations to my business leaders.  That’s why “Implement IPv6” is moving to the 2011 project list.  Of course, I’m hedging my bet by including a “Verify IPv6 Compatibility” project on the 2010 list, just like last year, and the year before that, etc.  It’s a low-cost project, and if the need for IPv6 comes on suddenly, we’ll be prepared to meet it with code upgrades, not hardware purchases.

If you are planning your own 2010 projects, I encourage you to think through your own situation.  It may be different than mine, or you may have more work to do to prepare for IPv6.

Jeremy

Tuesday, September 8, 2009

Technical Writing.. Books or Blogs?

I’ve been thinking lately about whether it still makes sense to write technical books.  Years ago, I had a contract to write a book for Cisco Press.  Due to a job change, I was only able to complete a few chapters, and after handing it off to another author, it seemed to die a quiet death.  I used to be able to find a reference to it on one of the Amazon websites, but a recent search turned up nothing.

About six months ago, the writing bug bit me again, and I created a proposal for “Enterprise Network Designs” and sent it off to Cisco Press for feedback.  In the (long) time between sending that email and receiving a meaningful response, I chose instead to begin writing this blog.  Blogging fits my schedule better, and it provides a more interactive communication platform.  IIRC, I felt a good deal of deadline pressure and a bit of nervousness about making mistakes.  Once information is committed to paper, it’s difficult to fix it.

As a result of these events, I’ve been trying to determine if we’ve reached the end of the technical book publishing industry.  I don’t recall the Cisco Press contract being especially lucrative.. something like 10 – 15 percent of gross revenues go to the author, minus some expenses.  As several technical authors have mentioned, you don’t get into the field for the money.  Maybe they’re just trying to keep the competition out, but for some reason I doubt that’s the case.

Why wouldn’t technical authors go the blog route, and cut out the publishing middleman?  This would eliminate much of the overhead of publishing, as well as free the author from official deadlines.  Revenue can be generated by monetizing the blog, as well as follow-on contract work.  If the content is well-written and relevant, the professional prestige gained from the effort should be comparable to being a published author.  Technical book readers are by definition technical, so reading a blog should be well within their comfort zone.

 

What would the author be missing?  A few items are:

Deadlines - Is that good or bad?  Depends on the author, I suppose

Copy Editor – Go with a freelance editor?

Book Signings – Not sure a substitute is available for the blogger

Seeing Name in Print – Ditto.. no obvious substitute available

Copyright Protection – There must be a good solution to this, right?

Publisher Credibility – Cisco-related books from Cisco Press probably significantly outsell books from Pearson Publishing, even though they’re the same publishing house.  The lack of a well-known imprint could make it difficult to build an audience.

 

What are the advantages?

Full control over content

Easier publishing process (arguable, I suppose)

Infinite ability to revise content after publishing

Better interaction with readers

Better for the environment, if that is meaningful

 

I would think it would be relatively easy to monetize the content through eBooks, like the Kindle.  Some technical blogs are already syndicated on that platform, such as Jeremy Stretch’s Packet Life blog.  Thirty cents per month per user probably isn’t terribly lucrative, but for hassle-free revenue, why not?

Monetization Strategy

Google Adwords

eBook syndication

Professional consulting

Partnership with training vendor (depending on blog content)

 

 

I’d love to get some feedback, especially from authors (books and blogs).  Have you considered something like this, or are you already do it?  What are the challenges you’ve faced?

Monday, August 31, 2009

Recap of Cisco Certified Architect Update

On Thursday I attended a Cisco meeting on the CCDE and Architect certification programs.  There was not a lot of new information offered, so some of this has already been discussed on the blog.

 

Timeline

October 2009:  Blueprint available on Cisco Learning Network

November 2009: Application process open on CLN

January 2010: First board review availability

Candidate Application Process

To apply, you need to be an active CCDE.  The next step is to submit a resume, which should demonstrate 10 years of networking experience and some architect/design responsibilities (not necessarily 10 years of these responsibilities).  The candidate also needs to submit a Project Summary for an architect or design project that he/she led.  During Q & A, I asked if there would be a specific format for this.  That hasn’t been determined yet, but I don’t see how the application review team could handle this without a specific format, since we all use unique project documentation.  The resume and project summary will be reviewed, and questions will be developed to be asked during a short (15 minutes?) phone interview.  This will allow the application review team to be certain the candidate is qualified to take the exam.

 

Test Pre-Work

Assuming the candidate has passed these checkpoints, he/she will submit payment and receive the project documentation / test materials.  The candidate will have a few weeks to develop an RFP-style response, which will consist of a Functional Specification, High-Level Architecture Diagram, an outline of the Board Meeting presentation and a few other documents that I failed to note.   The presentation used during the meeting is not yet available for download, so I can’t look up the missing documents at this time.  This documentation will be submitted in advance of the board review, to allow the board to develop questions based on its contents.

 

Board Review

The board review is scheduled for six hours.  No mention was made of where it will be, or if Telepresence will be used for the reviews.  There seemed to be a hint or two of using technology during this process, and I believe the initial board is physically located in separate offices, so I’m expecting a Telepresence presence, so to speak.  The timeline for the review is:

1 hour – Present design solution to Executive Team

1 hour – Respond to questions from Design Team

15 minutes – Presented with “What If” change to original architecture challenge

2 hours, 45 minutes – Create solution based on new “What If” requirements

1 hour – Present and defend new solution

Out of the six hours, three hours are spent presenting or defending solutions.

 

Other Notes

- I don’t believe anyone mentioned when the candidate will receive a pass/fail result, or what sort of feedback is expected.  As a potential candidate, I’d like feedback with either result.  Even successful candidates have room for improvement.  That’s one of the negatives to receiving only a “Pass” result to the CCIE/CCDE exams.

- Someone from Cisco on the call suggested that there would be around 100 successful Cisco Certified Architects in year one.  I’m sure most of the call attendees (and a few of the presenters!) did a double-take at that news.  One of the Q & A questions bluntly asked “If there are 7 current CCDEs, how can you get 100 Architects?”  The response was that the CCDE program is expected to pick up steam over the next year, and many CCDEs are expected to immediately pursue the Architect certification.  I have my doubts.. my guess is 15 in 2010.  I don’t think there have been 100 unique CCDE candidates to date.

- There will be a recertification process, and it will likely consist of contributing to the program via content development or serving on future review boards.  It’s a good strategy, as developing this content will be highly time intensive.  It will not be possible to ramp up this certification program to hundreds of candidates without significantly expanding the certification team.  This should be a cost effective method of accomplishing that.

- I was a bit surprised at the lack of questions from the meeting participants.  Perhaps everything is too new at this point.  I was going to also ask about the expected pass rate, but I doubt they have a target, so it didn’t seem worth the time to me.  The only notable question was about the cost, which I don’t believe was addressed in the presentation.  The $15,000 price tag was confirmed, with some mention of it being comparable to Microsoft’s top-level certification.  I have no knowledge of their programs, so I’ll take the Cisco team at their word on it.

- The presentation is supposed to go up on CLN soon.  It went into nice detail on the Candidate Objectives, so it will be nice to see it on my own time.  That part of the presentation went a bit too fast for me to absorb all the info.  If I had known in advance, I would have taken some screenshots!

- The presentation team went into a bit of detail on the differences between Network Designers and Architects.  Most people I know (including me!) use the terms interchangeably.  The basic point being made was that Network Architects take business requirements and turn them into functional network specifications.  Network Designers take the specifications and create a network.  I don’t know of anyone who functions as an architect without also creating the network design.  I certainly know of engineers who only perform the second step.  So who creates the functional network specifications for them?  In most cases, it isn’t really created.  I have been guilty of this.  I’m told of a business problem that needs to be solved, and almost immediately I fire up dynamips and configure a solution.  I would like to think that intuitively I’ve thought through the architecture and concluded that my solution is the best one, but without explicitly going through the process, I can’t truly know that I am correct.  Over the last year, I have made a point of incorporating architecture/design reviews into my project planning, and asking for more input from project team members.

Wednesday, August 19, 2009

CCDE Written and Practical Study Plans

In a recent blog post I made mention of creating a structured study plan and following it. To demonstrate this, I dug up my CCDE written study plan.

Background

I learned about the Cisco Certified Design Expert program on the shuttle ride from the airport to Cisco Live 2007. By chance, I was on the same shuttle as a Lee, a former co-worker that I hadn’t seen in seven years. He mentioned that there was an invitation-only announcement of a new expert level certification taking place at Networkers. As I had not received an invitation, I didn’t know anything about it. While wandering around the show area on Monday, I ran into David Bump, who was involved in the CCDE launch. He kindly extended an invitation to the kick-off meeting so I wouldn’t feel left out.

The attendee list in that meeting was impressive, to say the least. I saw a number of well-regarded network architects, and more than a couple friends and former colleagues. It was clear that Cisco invited the right group! The presentation and handout made me realize that I had a number of holes in my knowledge that would need to be addressed if I wanted a chance at success on the CCDE written beta exam.

Creating My CCDE Written Study Plan

My first step in creating a study plan was to make a duplicate of the handout, as I didn’t know if I would be able to get another copy. It’s now on-line at http://www.cisco.com/web/learning/le3/ccde/ccde_exam_information.html, so this step is no longer necessary. On my copy, I highlighted the areas that I felt were weaknesses. Unfortunately, it seemed like most of the paper had been highlighted. Who is Paul Baran* anyway? ;)

After analyzing the areas, I grouped the technologies into seven major topics, and I assigned a level of confidence in my abilities. I also planned to address some of the other blueprint topics, like Network Management and Security, but these were the topics I felt I should focus on:

OSPF Medium
EIGRP Medium-High
IS-IS Low
BGP Medium-High
Multicast Medium-High
MPLS Low
Quality of Service High

Next, I figured out what resources were available to me. I purchased several books from the CCDE Written Exam Reading List (Optimal Routing Design, Network Management Fundamentals, BGP Design and Implementation) and pulled a few classics off the shelf (Routing RCP/IP Volume 1, Developing IP Multicast Networks). I also had access to Cisco Live Virtual, and of course the RFCs were readily available. I mapped these resources to my topics, and determined how much time I had available to devote to each one. This is the result:

CCDE Study Plan:

66 hours total

OSPF (10.5 hours)

Routing TCP/IP OSPF Chapter - 2 Hours
OSPF Tech Notes - 4 hours
OSPF Handbook - 30 minutes
RFCs - 2 hours
Review - 2 hours

EIGRP (5 Hours)

Routing TCP/IP EIGRP Chapter - 1 hour
EIGRP Tech Notes - 2 hours
EIGRP Handbook – 30 minutes
Review - 1.5 hours

IS-IS (10.5 hours)

Routing TCP/IP IS-IS Chapter - 2 Hours
IS-IS Tech Notes - 4 hours
IS-IS Handbook - 30 minutes
RFC - 2 hours
Review - 2 hours

BGP (10 hours)

BGP Tech Notes - 4 hours
BGP Handbook - 1 hour
BGP Book - 3 hours
Review - 2 hours

Multicast (5 hours)

PIM RFC - 2 hours
Multicast Book – 1 hour
MBGP on CCO - 2 hours

MPLS (20 hours)

Intro to MPLS presentation - 2 hours
Other presentation? - 2 hours
Books - 10 hours
BGP/MPLS - 2 hours
Cisco website - 4 hours

QoS (5 hours)

RFCs - 3 hours
Translation to Transport Layer - 2 hours

I can’t say that I stuck completely to this schedule, but it served as a guide for my efforts. It allowed me to focus time on my perception of my weaknesses. As is usually the case, I did better on my ‘Low’ topics than I did on my ‘Medium-High’ ones. I try to remind myself to at least brush up a bit on my strengths, but life often gets in the way of studying, and I naturally concentrated on the less familiar topics. It’s the topics that I work on every day that give me the most trouble, because I consider myself an expert, but when I really think about it, I’m only an expert on the portions of the topic that I use regularly. For example, I was certain I knew Quality of Service cold.. but did I know how ToS gets mapped into MPLS EXP? No, because I didn’t run MPLS in my network. It’s a trap I fall into regularly when studying.

CCDE Practical Thoughts

To the group of engineers attempting the CCDE Practical on August 26th, good luck! I’d like to see a good number of successful candidates. I hope to hear that someone cracked the 50% mark, which I believe has not yet been breached in this exam. Like the CCIE exam, successful candidates do not get a graded score, only a PASS result, so this is a rumor, not a fact. My opinion is that passing this exam has been too difficult, and I fear that if the exam gets labeled as ‘impossible’, the certification program will lose its candidates, and ultimately its market value. I know a few of the unsuccessful candidates (personally and by reputation), and I can say that they are undoubtedly good network design engineers. If the goal of this certification program is to identify and recognize engineers of their caliber, it must be missing the mark, at least by a small degree. Perhaps they’re not good test takers, but it concerns me when obviously qualified candidates are unsuccessful. That said, we’re still in the very beginning of the life cycle of this certification. I am confident that it will be properly tuned to stop great candidates from failing.

CCDE Practical Advice

Unfortunately, I didn’t create a good study plan for the CCDE Practical, or I would share that as well. The two books that I think were most helpful in preparing for it were Optimal Routing Design and Definitive MPLS Network Designs. If I felt that Quality of Service was a weakness, I would have concentrated heavily on RFC 4594, Configuration Guidelines for DiffServ Service Classes and perhaps End-to-End QoS Network Design (but skip all the configuration; it’s not important for this exam).

The key to the CCDE Practical is to know the technology! I don’t believe it is possible to pass this exam if you have holes in your tech stack, so to speak. The exam is scenario-based, and if you run into a scenario based on IS-IS, and you don’t know the protocol, it would be practically impossible to make up the points in another scenario. It’s not that the exam is intended to test your technical knowledge; that’s not the primary goal. The issue is that is uses the technical blueprint as a basis for asking network design questions. You need to be able to speak the same technical language as the exam creator. For those with a programming background, it would be like testing application design skills using Java, when you only know C++ and Pascal. Before you ask, yes, my programming knowledge is that out of date.. feel free to substitute relevant programming languages if you retell this analogy :)

If you have the technology covered, the rest is pretty much intuition. There’s a certain feel to a good network design, as opposed to inefficient ones. The open-ended questions are very difficult, as the right answer isn’t staring at you from the options list. Also, because you can’t go back and change answers, you will certainly answer a question, click ‘Next’, and immediately learn you got the question wrong, based on the wording of next question. It happened to me at least once, and probably several times. Don’t let it get you down. As I mentioned above, I don’t think anyone has scored 50% on this exam yet. Answering incorrectly is part of the process. If (when) this happens to you, rethink the previous answer, and change your course of action from here out if you feel you were incorrect. Don’t be stubborn. There are no points available for being consistently wrong!

Again, good luck to all who are attempting this exam, whether it is in August, December or a future date.

Jeremy

* Paul Baran is the father of packet-switched networking.. something I’m almost embarrassed to say I didn’t know prior to studying for the CCDE.

Monday, August 17, 2009

Corporate Versus Consulting Jobs

One question that repeatedly comes up in conversation is whether corporate or consulting jobs are better.  Once I get past the obvious ‘It Depends’ response, there are some clear differences that aren’t necessarily apparent at first glance.  Here is my take on the question.

 

Career Versus Job

I’m definitely not breaking new ground when I say there is a clear distinction between a career and a job.  My chosen career is computer networking (or less specifically, Information Technology), and my current job is a network manager / architect for a Fortune 500 insurance company.  I’ve had several jobs in my career, but only one actual career.  Your career is generally the answer to “What do you do?”, while your job is the answer to “Where do you work?”  When I’m asked, my answers are “I build and maintain computer networks” and “I work for an insurance company” respectively.  It is extremely important to put the emphasis on your career.  Often times, focusing on your career and focusing on your job go hand-in-hand, but occasionally they will diverge.  For example, after I earned my CCIE certification, I sat down for my first performance review.  My manager, who I respected very much, was thrilled with my performance, but could only offer a modest increase in compensation on a low base salary.  He mentioned that the plastics industry (my employer’s field) was in bad shape, and he would have offered more if it was possible.  I understood the predicament, but later I realized that I didn’t work in the plastics industry.  I work in the computer networking industry, and things were going quite well there.

Don’t take this advice to an unwise extreme.  Demonstrating a history of indifference to your employer will undoubtedly decrease your value over the long term.  It would be very difficult to find a suitable job if your resume is long on experience but short on loyalty.  As a manager, I can easily understand early career applicants showing a few employers, especially if the increase in responsibility and experience demonstrates a logical pattern.  If the pattern continues into the applicant’s later career, I’d be inclined to wonder why things weren’t working out with so many different organizations.

 

Categories of Networking Jobs

I broadly separate my employment history into two categories:  Corporate positions and Consulting positions.  Within the networking field, corporate positions entail working in a specific environment to solve problems for your organization.  You don’t necessarily need to be a full-time employee of the organization; I’ve had corporate jobs where I was a contractor.  Consulting positions usually have all the responsibility of corporate positions, plus an extra layer of organizational overhead.  To be truly successful in a consulting position, you’ll need to focus on the profitability and success of your consulting organization first, while also fulfilling the needs of your customer (the corporate entity).  I’m leaving out reseller positions, primarily because I’ve never had one, so I haven’t given much thought to the environment.  I’ll theorize that it is similar to consulting, with the added pressure of pushing products (physical, rather than labor) on the customer, but that’s just a guess.

I spent three years (1998 – 2001) in consulting, and nine years in corporate environments.  My consulting roles were during the heyday (and later the crash) of IT consulting.  Maybe things are different now, so keep that in mind while considering my thoughts below.

 

Points of Comparison

Suitability for Early Career

I’ve frequently questioned the suitability of consulting for early career engineers.  At my local Netigy Corporation office I saw the drawbacks of hiring junior engineers.  They’re often the first to go during tough times, and without suitable assignments, it is very difficult to learn new skills.  Most engineers need meaningful work to build their skills, and at times, it can be a challenge to find that in a consulting environment.  Consulting organizations are hesitant to place an engineer in a role where they haven’t already proven their abilities.  Corporate environments generally provide better opportunities to acquire and utilize new skills.  There is also a great advantage to implementing a solution to a problem and then seeing how it works over the long term.  This is more likely to happen in a corporate environment.

Compensation

Perhaps the best way to understand compensation is to think about what makes an engineer valuable to an organization.  In consulting, I’d say about ninety percent of your value is tied to your revenue (bill rate X utilization).  I was once directly told that my compensation had hit a peak because my bill rate didn’t support an increase.  This was despite my efforts to grow my account from two to seven consultants, often at significantly higher rates than I was billing.  More sophisticated consulting companies will factor this in, plus mentoring / leadership, pre-sales work and other factors, but ultimately profitability comes down to individual revenue versus expense.  Compensation is not as easy to understand in a corporate environment.  You can look at replacement cost (how much would it cost to higher a new engineer for the same role), corporate salary structures, value delivered, etc, but it’s less quantifiable than in consulting.

It has become apparent to me that corporate salary structures are getting much closer to the consulting environment.  In the late 90s, I found it necessary to enter consulting to achieve the full value of my experience.  Even in 2001, when the consulting market was beginning to crumble, I couldn’t justify moving back to the corporate world on the basis of compensation.  I chose to stick with Netigy until they closed their doors in September 2001.  Only then did it make sense for me to move back to the corporate world, this time as a contractor.  In talking with other engineers who have recently changed jobs, I can see that the pay disparity has shrunk considerably, and when you add in the value of non-cash benefits, we may be nearing parity.

Job Security

In consulting, your job security is dependent on your ability to bring in revenue.  There’s always a place for cash-flow-positive consultants.  In the corporate world, again, things are not so clear.  Because there is no direct correlation between job performance and revenue, job security comes down to relationships, replacement value, the organization’s view on the value of IT, and countless other factors.  You also need to look at the viability of the organization.  If your employer (whether consulting or corporate) isn’t able to keep the doors open, it doesn’t really matter how good you are at your job.

I prefer to look at job security more broadly, in terms of employability.  It never much mattered to me if my current job was secure, as long as I could be confident that I would find another job in a reasonable amount of time.  This viewpoint allowed me to find a wonderful opportunity when Netigy went out of business.  If I had been focused on my specific job’s security, I would certainly have left Netigy for another organization before they went bankrupt.  It also served to calm my nerves a bit in March 2009 when the stock market seemed to indicate that my current employer would not make it through the financial crisis.  If this viewpoint appeals to you, be sure to save some money.  I’m sure you can find better financial advice from others, but I suggest having a considerable amount of available cash to handle the inevitable periods of unemployment.

Relationship Building

Without a doubt, I met more people from the networking industry in my three years of consulting than I did in nine years in the corporate world.  Some of the relationships I’ve built have lasted for a decade or more, and several have resulted in jobs for both me and my counterparts.  In fact, I’ve dragged one guy between three different companies, and I wouldn’t hesitate to call him again if the need arose.  In the corporate world, it is easier to create lasting relationships, but I still suggest acquiring the ability to quickly build rapport.  Use social networking (Twitter, LinkedIn, Cisco Learning Network, etc) and get to know people in training classes and seminars.  Don’t think of this as creating a network to use later, but as a way to add a social component to your career.  No one wants to be used, and if that’s your intention, it will be obvious.

Work/Life Balance

Bear in mind, I’m coming at this category as a married father of three boys.  What I want out of this category may be wildly different than your needs.  In my experience, a corporate position provides the better work/life balance.  I am sure there are corporate positions that require travel, and there are consulting positions that allow you to be home every evening.  I’ve done the constant travel thing, and it has a few benefits:  Frequent flyer / hotel points, plenty of study time, seeing new places, etc.  If I weren’t a family man, I would probably find even more great things about the traveling lifestyle.  Even in my corporate positions, I’ve found opportunities to do some travel, and it has been a nice change of pace.  My worst work/life balance issue in the corporate world was the three years of long (90+ mile each way) commuting I signed on for.  And of course I still get to ‘carry the pager’ for on-call work and escalations, as well as work the strange hours that our change windows dictate.  In return, I get to work from home and put my kids on the school bus in the mornings.

These are rather gross generalizations of the two environments.  I know of one consultant who has worked almost exclusively with the same government client for a dozen years.  He lives near his client and seems to have worked out a great work/life balance for himself.  I have also worked for a consulting company that strived to keep everyone local.  Every situation is unique, so don’t rely on these generalizations.  Do your homework if you are interested in changing jobs.

Need for Certification

Consultants need credentials to find work.  Consulting organizations use certifications in their marketing and sales pitches.  It’s a required part of the game, and it benefits the consultants, as their value in the industry increases because of it.  I once pursued my CCIE in WAN Switching to help out my consulting employer.  I got as far as the CCNP and CCIE written, and was scheduled to take the lab when I got sidetracked.  It assisted us in landing a significant engagement and allowed me to broaden my horizons a bit.  I only rarely used the knowledge, and by now, it’s almost completely gone, but it was a great experience.

As I mentioned in a previous post, corporate employees should also acquire certifications.  It makes life significantly easier when you find the need to market yourself via a resume.  Don’t shortchange yourself, especially if you can a supportive employer who is willing to assist you in achieving your goals.  If you are having trouble selling the idea of getting certified to your employer, remind them there are benefits for them too.  My pursuit of the CCDE cost my employer approximately $3000, including travel, books and registration fees.  In return, they got dozens of hours of evening/weekend study time, and in my opinion, a more qualified Network Architect.  All for less than the cost of a one-week training course.  If that doesn’t convince them, remind your employer that CCIEs get priority escalation for TAC cases, which will cut down on MTTR.

 

Additional Thoughts

I certainly don’t want to come across as recommending that you should always chase the money.  A friend commented on my last post that there are other factors to job/career satisfaction other than money, and he’s 100% correct.  I would not take or stay at a job that didn’t meet my work/life balance requirements, even if the money was excellent.  As a matter of fact, I worked with David at a previous employer, where I left for this very reason.  He doesn’t remember, but in fairness, I was only there for six weeks before I realized the fit wasn’t right.

David made a second point that bears repeating.  Non-technical skills are very important, and need to be developed alongside technical ones.  It’s a bit more difficult to build that into a study plan.  My recommendation is to emulate successful people at your job.  If you don’t see any, well, you’re probably not with a very good company and you should be considering whether you’re at the right place.  Most people are happy to provide advice and mentoring.  It’s one of the best parts of my current job.

 

Summary

Again, remember that my consulting experience is from about a decade ago.  Things may have changed significantly since then.  If they have, please add your thoughts in the comment section so I can be more accurate in future posts.  As for me, I’m reasonably happy with my current job.  It has given me the flexibility to pursue my career interests, including management.  I have also been able to do a bit of consulting to maintain skills in technical areas that don’t exist in my employer’s environment.

Hopefully these descriptions will help you make the right career decisions.  Keep in mind, if you ever do make an unwise career move, you can always make another move to get back on track.  I don’t know many people in this field who haven’t made a bad career move at least once...  I know I have!

Monday, August 10, 2009

Career Advice From a Networking Veteran

It’s hard to believe I’m a veteran of computer networking, but the facts are indisputable:

X

Performed an IGRP to EIGRP Conversion on a Production Network

X

Significant Work on a Token-Ring Network (Not an Ethernet Conversion!)

X

Passed CCIE Lab When it Was Two Days Long

X

Holding Shares of CSCO With $60+ Per Share Cost Basis

(Sadly, the Cisco shares are in my son’s account. I guess I’ll owe him a few dollars out of my bank account when he’s ready to spend the money to make up for that decision!)

While I don’t feel too old, I must have enough life experience to ‘give back’ to the next generation of Networkers. So here’s my attempt to help out.


Early Career

If I could make only one recommendation to early career network engineers, it would be to get certified. CCNA, CCNP, whatever.. The job market is too crowded to not have some sort of credentials attached to your resume. I’ve been on both sides of the hiring game. Without some sort of certification, you need an extremely strong social network to get your foot into most doors. There are too many HR-provided ‘resume filters’ in place now.

One of my most recent hires came to us without any certifications. In almost all circumstances I would have dismissed the resume quickly. Fortunately for us, the candidate and I had worked at the same consulting firm a decade ago. Although I didn’t know him personally, I was able to reach out to a former colleague, who contacted the candidate’s former manager. After an excellent recommendation, I was willing to put my trust in the candidate’s resume and after a series of positive interviews, we made a great hire. If it were not for the common former employer, I doubt we would have had the confidence to make the job offer. Our organization would have been worse off for it, but the costs associated with making a bad hire are such that I needed to be extremely confident to make an offer. My assumption is that if you are in your early career, you haven’t had time to build the necessary social network to overcome the lack of a certification.

A second recommendation is to immerse yourself in the subject matter. Create a structured study plan and execute it. When I began working in this industry in 1997, I knew very little about networking. I was put into a ‘sink or swim’ situation, as the two engineers who previously managed the company’s network had both left with little notice. I started in June of 1997, and went to the ICRC and ACRC at Chesapeake Computer Consultants within the first two months. With those courses as a base, I decided on a study plan of reading the (then current) IOS 11.2 Configuration Guides from beginning to end. There weren’t a lot of study materials available at the time, so I made the best of the situation. As a college student finishing up my Senior year, I would print thick sections of the documentation and bring it to my classes to read. I don’t know that this is the best study method today, given the abundance of targeted study materials available from Cisco Press and others, but it worked for me at the time. I ultimately passed the CCIE lab using this method. For a current example of a structured study plan, I suggest you follow Aragoen Celtdra’s Route My World blog. I used a similar method when I pursued the CCDE written and practical exams.

Mid Career and Later

The studying and reading only intensifies once you achieve your CCIE certification. If only someone had mentioned this to me in 1998! Like all successful CCIE candidates, I was on top of the world after passing the lab. I was sure I was an expert my field; after all, that’s what the ‘E’ stood for! I decided to interview for a position with Chesapeake Computer Consultants. That’s when I found out how much I didn’t know! While I did ultimately received a job offer, I was astounded by the depth of the technical interviews. It was at that time that I realized my chosen profession required constant study to maintain relevance. The CCIE recertification process reinforces this, and that’s why I don’t have any issues with sitting for a written exam every two years.

My recommendation is to constantly seek positions which challenge you, and hesitate before trading opportunity for money. I spent over three years in a lucrative contracting position, but during this time I didn’t learn very much. I sort of missed the MPLS revolution, and I’ve been catching up ever since. I even took the easy way out on my CCIE recertification by repeatedly taking the CCIE WAN Switching exam, which didn’t appear to change at all from 2000 until 2003. Money is an important thing in life, so while I would never advise someone to turn it down, I would highly recommend weighing a slight increase in current dollars to the impact a position will have over your career. I didn’t do irreparable harm to my career, but I certainly set myself back a bit by narrowly focusing on older technology. On the plus side, I worked with an incredible set of people and built wonderful relationships that eventually delivered me to my current position.

A more controversial suggestion is to strongly consider changing employers as you work your way up the salary ladder. I have seen first hand that it is extremely difficult for an employer to keep up with high performing employees. Most corporate salary structures are not designed to cope with the rapid increase in value delivered by such employees. I found it necessary to leave my corporate employer after achieving my CCIE certification for this reason. I ultimately decided that although it would be difficult to leave, it was unfair to my wife and child to work for less than market value. This was perhaps the most difficult professional decision I’ve made thus far, but in retrospect, it was clearly correct. This is less true in the consulting world, where it is easier to judge an employee’s value. Under most circumstances, bill rates are a strong proxy for value. Consultants can easily demonstrate their value by increasing the revenue delivered to their employer. For a fun example of this, take a look at this Game Theory view of salary negotiation: http://mindyourdecisions.com/blog/2009/08/04/how-to-negotiate-a-pay-raise-with-game-theory/.

Summary

So, to recap, get certified, study methodically and regularly, chase money and opportunity in your early career, but go for balance.. don’t overvalue either. Also, make sure you build lasting relationships along the way. It ultimately makes everything you do more meaningful, and adds a fun social component to trade shows and seminars. My favorite part of Cisco Live is catching up with friends and former colleagues. If you see me in Las Vegas at Cisco Live 2010, please introduce yourself!

Monday, August 3, 2009

When Administrative Distance Doesn’t Count

In troubleshooting a network issue, I came across an interesting situation where EIGRP and OSPF metrics are directly compared. Obviously, this is a flawed design, and it has been updated to fix the issue. I thought this might be a good opportunity to share the experience.

Problem
Traffic is not taking the optimal path from RouterE to 10.34.0.0/16. Here is the topology:


Traffic from RouterE to 10.34.0.0/16 should go to RouterD, then to RouterY, and then to 10.34.0.0/16. What we're actually seeing is this traffic take the less-optimal path of RouterE->RouterC->RouterD->RouterY.

Cause
Here is the output of 'show ip route 10.34.0.0' on RouterE. We should see the next hop as "10.0.3.1, from 10.0.1.11", but here is what we see instead:
RouterE#show ip route 10.34.0.0
Routing entry for 10.34.0.0/16
Known via "bgp 65000", distance 200, metric 4294967294
Tag 65001, type internal
Last update from 10.0.4.1 2w1d ago
Routing Descriptor Blocks:
* 10.0.4.1, from 10.0.1.20, 2w1d ago Route metric is 4294967294, traffic share count is 1
AS Hops 1
Route tag 65001

So the question is, why would RouterE prefer to go all the way to RouterA to get to the 10.34.0.0/16, when there's an equally good (and closer!) path to the network off of RouterD. To find the answer, we need to look at the BGP table for 10.34.0.0/16. Here's that output:
RouterE#show ip bgp 10.34.0.0/16 BGP routing table entry for 10.34.0.0/16, version 107467
Paths: (2 available, best #1, table Default-IP-Routing-Table)
Advertised to update-groups:
2
65001, (aggregated by 65001 10.1.34.1), (received & used)
10.0.4.1 (metric 13) from 10.0.1.20 (10.0.1.20)
Origin IGP, metric 4294967295, localpref 100, valid, internal, atomic-aggregate, best
65001, (aggregated by 65001 10.1.34.2), (received & used)
10.0.3.1 (metric 26112) from 10.0.1.11 (10.0.1.11)
Origin IGP, metric 4294967295, localpref 100, valid, internal, atomic-aggregate

I highlighted the key information, which is the IGP metric to reach the BGP next hop. This is the eighth point of comparison for BGP routes, but the seven more important comparison points are all equal (Click here for BGP Path Selection Process). Notice that 10.0.4.1 is only metric=13 away, while 10.0.3.1 is metric=26112 away. Where do these values come from? Here is that output:
RouterE#show ip route 10.0.4.1
Routing entry for 10.0.4.0/31
Known via "ospf 80", distance 110, metric 13, type intra area
Redistributing via eigrp 80
Last update from 10.0.0.37 on GigabitEthernet1/1, 1w3d ago
Routing Descriptor Blocks:
* 10.0.0.37, from 10.0.1.20, 1w3d ago, via GigabitEthernet1/1
Route metric is 13, traffic share count is 1

RouterE#show ip route 10.0.3.1 Routing entry for 10.0.3.0/31
Known via "eigrp 80", distance 90, metric 26112, type internal
Redistributing via eigrp 80
Last update from 10.0.0.40 on Vlan50, 1w5d ago
Routing Descriptor Blocks:
* 10.0.0.40, from 10.0.0.40, 1w5d ago, via Vlan50
Route metric is 26112, traffic share count is 1
Total delay is 20 microseconds, minimum bandwidth is 100000 Kbit
Reliability 255/255, minimum MTU 1500 bytes
Loading 1/255, Hops 1

The route to 10.0.4.1 is learned via OSPF, while the route to 10.0.3.1 is learned via EIGRP. In this environment, OSPF uses fairly small metrics (LAN links have a cost of 1, WAN links have a cost of 10). We use the default EIGRP metrics, which are much larger numbers. Under nearly all other circumstances, IGP metrics are not compared between different routing protocols. This is one of those 'special cases'. Normally different routing protocols are compared based on Administrative Distance, and the routing protocol with the lower Admin Distance is preferred, with no regard to metric values.

Solution
To solve this issue, I removed the RouterD<->RouterE link from EIGRP routing, but more importantly, I also removed all AS65000 network links from the EIGRP topology table. This prevents RouterE from learning the 10.0.3.0/31 route via EIGRP. With both 10.0.3.0/31 and 10.0.4.0/31 learned via OSPF, the metric comparison makes sense, so the path is properly chosen. Here is a diagram of the new environment (notice the subtle movement of the EIGRP 80 domain to exclude the RouterD<->RouterE link.. it's the only change):

And here is the show command output confirming the solution worked:
RouterE#show ip route 10.34.0.0 255.255.0.0 Routing entry for 10.34.0.0/16
  Known via "bgp 65000", distance 200, metric 4294967294
  Tag 65001, type internal
  Last update from 10.0.3.1 00:02:17 ago
  Routing Descriptor Blocks:
  * 10.0.3.1, from 10.0.1.11, 00:02:17 ago       Route metric is 4294967294, traffic share count is 1
      AS Hops 1
      Route tag 65001

RouterE#show ip bgp 10.34.0.0/16
BGP routing table entry for 10.34.0.0/16, version 118687
Paths: (2 available, best #2, table Default-IP-Routing-Table)
  Advertised to update-groups:
     2        
  65001, (aggregated by 65001 10.1.34.1), (received & used)
    10.0.4.1 (metric 13) from 10.0.1.20 (10.0.1.20)
      Origin IGP, metric 4294967295, localpref 100, valid, internal, atomic-aggregate
  65001, (aggregated by 65001 10.1.34.2), (received & used)
    10.0.3.1 (metric 2) from 10.0.1.11 (10.0.1.11)
      Origin IGP, metric 4294967295, localpref 100, valid, internal, atomic-aggregate, best

RouterE#show ip route 10.0.3.1 Routing entry for 10.0.3.0/31
  Known via "ospf 80", distance 110, metric 2, type intra area
  Redistributing via eigrp 80
  Last update from 10.0.0.40 on Vlan50, 00:02:34 ago
  Routing Descriptor Blocks:
  * 10.0.0.40, from 10.0.1.11, 00:02:34 ago, via Vlan50
      Route metric is 2, traffic share count is 1

RouterE#show ip route 10.0.4.1
Routing entry for 10.0.4.0/31
  Known via "ospf 80", distance 110, metric 13, type intra area
  Redistributing via eigrp 80
  Last update from 10.0.0.37 on GigabitEthernet1/1, 2w0d ago
  Routing Descriptor Blocks:
  * 10.0.0.37, from 10.0.1.20, 2w0d ago, via GigabitEthernet1/1
      Route metric is 13, traffic share count is 1



Summary

In the grand scheme of things, a single additional gigabit Ethernet hop is inconsequential. So why does this matter enough to open a change request and wake up at 6am on a Sunday to fix? For me, it’s a sense of completeness. There was something ‘not right’ about the way things were working, and fixing it pushes the needle a little closer to ‘done’. I equate this to a love of symmetry. For some people, it’s important for things to be balanced. When I take two eggs out of the carton, I take one from each row. It isn’t more correct, but it makes me feel better. Same goes for the network. If it is working 100% optimally, then any small issue is readily apparent. Terry Slattery said something similar in his blog entry on network hygiene a few weeks ago:
If you don't keep things clean, interactions between otherwise minor problems can create a larger problem
He’s absolutely right.. eliminating the small issues makes the big ones far less complicated.

Monday, July 27, 2009

Preparing for Performance Routing

In my previous blog entry, I explained the role Performance Routing was expected to play in my network. This time around, I'd like to show the steps we've taken to prepare for its deployment.

My network core consists of three major locations. All three sites house significant users populations, while two of the locations also host our major North American data centers. The locations are interconnected with OC-3 & Ethernet technologies, and each site is connected to one of our of two MPLS providers. The previous iteration of our network design used EIGRP to route between the three core locations, with filtering and summarization to prevent an unnecessarily large routing table, especially at the edges.

Our MPLS-based WAN uses BGP. With careful route injection at the core locations, and a small amount of AS prepending for the default route, we were able to achieve a rudimentary but effective load-balancing scheme. More effective load-balancing could have been achieved by sending the same routing table through both providers and using "bgp bestpath as-path multipath-relax", but we had two significant limiting factors that made our providers less than equal. First, only one of our providers could route our multicast traffic. The second provider also had an onerous Quality of Service charge, so we treated their network as a best-effort path.

Here's a diagram of the previous network design:



"MPLS B" supported QoS and multicast, so we connected our "Core B" location to it, as that site housed the majority of our voice and video equipment.


Rethinking Assumptions

In my view, one of the most intriguing features of Performance Routing is the ability to inject new routes into a routing domain based on the network performance. This can be done via BGP or static. The alternative to route injection-based path manipulation is Policy-based routing (PBR), which makes me uncomfortable from a troubleshooting perspective. I can relatively easily track changes to my routing table, but how will I recreate a PBR-based network if I need to document a transient issue? Static-based route injection has promise, as these routes can be redistributed into any IGP, but for us, BGP made the most sense.

As we considered how best to implement PfR, I kept coming back to the design decision to assign individual AS numbers to our core locations. When PfR injects a BGP route, it sets the 'no-export' community, which prevents that route from being advertised to an eBGP peer. The right choice during our initial MPLS implementation was now a stumbling block in our potential PfR deployment. It was also clear that we needed matching QoS policies with our MPLS providers. For this reason, and several others, we chose to migrate to a new provider for "MPLS A".


Transitioning to a Real Core


We decided to take this opportunity to create an independent Core AS. While site-level BGP AS assignments worked well for our remote locations, the core is a special case. By carving out a small network consisting of core WAN devices and interconnect circuits, we have compartmentalized our core attachments. The following diagram provides a view of the new topology. As before, redundancy has been eliminated to provide clarity.





In the previous topology, as-path length dictated that packets would enter the nearest MPLS network when exiting a core location. In the new design, packets are free to enter and exit the core at the most appropriate point. Overlaying BGP-based PfR on this topology is relatively straightforward, and is nearly the next step in the project. We first have to deal with the placement of our Cisco WAAS WAN accelerators, which will be the topic of a later blog entry. Once WAN accelerator placement is dealt with, we'll be ready to start our PfR pilot.

Sunday, July 19, 2009

The Case for Performance Routing

The migration to layer 3 MPLS networks has reintroduced an old network convergence issue. How do we detect the failure of an end-to-end path? This issue has existed in the LAN since the advent of routing protocols. In the LAN, the question is “How do I detect the loss of my neighbor, when my interface does not fail?”

In the following diagram, Router B's link to Switch Z has failed.




How does Router A know that it no longer has a path to Router B? Depending on the routing protocol and platform, the answer could be loss of hellos or lack of BFD responses. Point-to-point LAN connections also solves this problem, which is why they are highly recommended wherever possible. In the case of point-to-point, the physical interface will drop, which gives an immediately actionable signal to the router.

How does this apply to L3 MPLS VPNs and wide area networks? Picture the carrier's MPLS network as a big switch. When the remote site router at the top of the following diagram loses its connection to MPLS A, it takes several seconds (at best) for the Core A router to lose its BGP routes. Running EIGRP or OSPF with the carrier doesn't help much either, as it is the carrier BGP propagation delay that governs the down time. EIGRP/OSPF is only the carrier's edge protocol; BGP is still the core protocol.



One solution to this issue is to build an overlay network based on static GRE tunnels or DMVPN and running an optimized IGP on it. My opinion is that these options add considerable complexity to what is a relatively simple topology. I try my best to avoid overlay networks, as they tend to increase troubleshooting time. Preserving the any-to-any connectivity that L3 VPNs provide requires a full mesh of GRE tunnels, or in the case of DMVPN, a minor delay while the dynamic tunnels are built. This solution also does not scale as well as a pure BGP-based MPLS environment.

Another solution to this issue is Performance Routing (PfR). This feature enables your Core A and Core B routers to monitor traffic flows through their respective MPLS providers. If packet loss or delay is detected, outbound traffic is dynamically redirected to the functioning path. A corresponding policy on the remote site router handles return traffic. Performance Routing currently has the ability to re-route traffic within three seconds, with plans to reduce that to one second. Performance Routing also detects and reacts to degraded paths, such as intermittent packet loss and high delay or jitter.

Performance Routing (and its predecessor/component, Optimized Edge Routing) has been around for 4-5 years. Cisco is still working to add new features and options. The engineer I spoke with at Cisco Live was quite eager to hear how I wanted to use it in my network, and took several of my suggestions on how to improve the technology. In fact, I owe him an email with the details… I’ll get on that right away!

Over the next few months, we’ll be implementing Performance Routing in our network. Our biggest need is to protect our voice paths. Like most organizations, we’ve practically eliminated dedicated voice trunks in our network. We have call center voice that flows over our MPLS backbone. When our carriers have issues, our first call is from the voice team. My expectation is that while PfR won’t prevent the first few seconds of degradation, it will be able to re-route our voice traffic much faster than our current methods. I’ll post the details of our implementation and results in subsequent blog entries.

Monday, July 13, 2009

Cisco Live 2009 Recap

Cisco Live Recap

Two weeks ago, I attended Cisco Live (Networkers) 2009 in San Francisco, CA. This is my third consecutive year attending Cisco's annual user conference. I was a two-time attendee awhile back (1999 in New Orleans & 2000 in Orlando) when it was a CCIE recertification requirement. Then I dropped off the social networking map for a half dozen years while I worked 12 hour days, including my commute. I didn't have time to attend training, or a company that would pay for it!

General Thoughts

This year's event seemed much smaller than the last two years, despite Cisco's attempts to make it feel otherwise. A simple view of the dining area made that pretty clear. It was maybe 1/2 the size of the one in Orlando. Anecdotally, I ran into far fewer former colleagues this year as well. In Anaheim, I couldn't walk from one session to another without getting caught up in conversation with a former co-worker. Not so this year.

Perhaps because of the lower attendance, I learned more this year than in Orlando. My personal preference is to have Cisco locate the event outside of a major tourist area. Anaheim, New Orleans and San Francisco were great; Orlando was tougher, probably because my family was nearby. Las Vegas should be interesting as well, as there are many distractions, or so I've heard!

Highlights

My favorite sessions this year were LISP and L2MP. Both gave glimpses into what network design is going to look like in the next 12 - 24 months, assuming ship dates don't slip! For me, L2MP is especially interesting, as my employer is currently building out a pair of Nexus-based Data Centers. In fact, I'm writing this on my way to Raleigh for a Nexus CPOC event. Like many, we'll be using vPC to take advantage of parallel links, but Spanning Tree is still going to block a number of uplinks. L2MP / TRILL will allow for much higher overall utilization of redundant DC links.

LISP is the future of Internet routing. It's primary goal is significantly reducing the size of the Internet routing table, while giving a secondary benefit of Internet tail circuit load balancing for everyone who signs up. There is some work to do on the pricing model, and the end user education component could be tricky, but once those are worked out, the solution should take off. It certainly appears that the basic technology has been hammered out.

Both LISP and TRILL deserve individual blog postings, which I hope to get to soon.

Keynotes

I was surprised to see that Guy Kawasaki was scheduled to give the closing keynote address. After two consecutive comedians, it felt like Cisco was getting a bit too practical. Fortunately I was way off-base. Guy Kawasaki was as funny as Ben Stein or John Cleese, and perhaps significantly more relevant to the tech audience. I still chuckle when thinking of his "Unique" and "Value" quadrant, especially the upper-left corner.

The two Cisco-based keynotes were interesting, but not especially 'new.' Most of the key points have been covered in previous events (30 market adjacencies, etc), and all of the technology was at least familiar. I guess that's the drawback of attending annually.

Customer Appreciation Event

This was probably the most disappointing part of the conference. I have to say that the Customer Appreciation Event seemed a little flat. Perhaps the 80s theme didn't quite appeal to me, although I am a fan of the decade. Devo was surprisingly energetic and fun, even if I only knew a couple of songs. The 70 minute, 4 mile bus ride might have warped my view of the event as well, so I might not be the best person to offer an opinion of the entertainment.

Takeaways

The biggest technology takeaway for me was that Performance Routing is definitely doable in our environment. I attended a pair of PfR sessions, and had a great one hour Meet the Engineer discussion where we covered our current L3 topology and figured out the best way to implement the technology. We're a typical two MPLS provider environment, and we've had more than our share of real-time traffic pain when our providers have issues. In its current implementation, Performance Routing can take our outage times down to 3 seconds, with plans to dial that down to 1 second in most cases. I will cover our implementation in future blog posts.

Other takeways include:

- IPv6 is still getting tons of hype.. I'm keeping abreast of the basic technology. No plans to implement or even pilot in the next 12 months though, so I didn't attend any IPv6 sessions. It's difficult to justify spending time on IPv6 without a business case for implementation.

- CCDE interest appears to be ramping up. That's a good thing, as I don't want the certification to 'die on the vine'. I was concerned that the low pass rate would drive interest down, but that doesn't seem to be the case.

- I'm looking forward to next year's event. The next time I attend, I'll have NetVet status, which should be interesting. I don't have anything specific to say to the Cisco CEO, but I've heard that the CCIE NetVet reception is worth attending, so I'll certainly make it part of my schedule. #cllv on twitter, or so I've heard.