I have been having long discussions this week and today. Many have the benefit of hindsight.
For kit we have in Maidenhead we have two possible ways to connect to the world! One if via the local transit, a single point of failure link. Another if via multiple (well, soon to be) diverse fibre lines to different London data centres where we have multiple transit and peering.
Even before the second leg of our ring, one leg is a single transit and the other is several transit and peering links via multiple (pairs of) routers. And even that allows fallback via the single transit link, just in case.
The problem, as ever, is a partly ill link: one that seems to be valid for traffic but is not. We had that today.
Announcing their routes primarily via local transit could work, but transit back out to the world being local is more complex. We would be offering a less redundant, and somewhat specialised, solution...
So we have the issue of hindsight verses reality. Ongoing, a link with more redundancy is better. The last few days is was not...
So do we offer knee-jerk services that are technically worse looking forward? Or do we say no, this is shit that happened, and was only wrong in hindsight?