1

We have route based VPN tunnels going from a pair of Cisco routers at each of two sites (4 total routers), to two different VPN Gateways in the same shared VPC (XPN). Sharing routes through BGP, to a single Cloud Router. The two sites on our side each have a different AS, each pair peers over that same AS within the pair, but the peering between pairs is EIGRP. I'm redistributing the eigrp back into BGP (and vice versa) with a route-map and a specific metric. At one site I'm also redistributing in-out to OSPF, and the other in from static. The "local" routes to each site for each site come from ospf/static, and "remote" routes from eigrp.

So as an example...

AS 65001 is announcing 10.1.1.0/24 with a metric of 50 (it's a local route rdis from OSPF), and 172.16.1.1/24 with a metric of 100 (it's a remote route, learned from EIGRP).

AS 65002 is announcing 172.16.1.1/24 with a metric of 50 (it's a local route, rdis from static), and 10.1.1.0/24 with a metric of 100 (it's a remote route, learned from EIGRP).

There's actually 62 routes announced, but you get the picture. The list of 62 routes only varies in metric.


Now my problem... The cloud router takes all the routes from 65001 and makes them primary/active regardless of metric, and ignores the lower metric routes from 65002 unless I drop the tunnels/peering to 65001.

So my traffic always gets where it needs to go, but is taking sub-optimal routes.

This was working fine about two weeks ago, it has only stopped working as expected at some point since then.

I changed one of the AS 65002 routers over to 65001 (just with a neighbor statement local-as, and updating the GCP side also, as I'm also using AS 65002 to peer to Azure from those routers), but it didn't seem to change this behavior.

  • Cloud router adds regional cost to the MED value of learned routes (specified by your on-premises router). As per [documentation](https://cloud.google.com/router/docs/concepts/overview#determining_best_path_for_egress_traffic) "Regional costs can periodically change based on factors such as network performance. These changes can affect how packets are routed. If you notice routing changes, it might be due to updated regional costs." which may be the possible reason for this behavior you are experiencing. – N Singh Oct 30 '17 at 17:41
  • This is all in a single region, so there's no additional regional cost. – Bryan Vukich Oct 30 '17 at 18:24
  • 1
    Confirm if the route taken is the route with the shortest AS path length. Once verified and seems an issue with the cloud router, file an issue report at [Google Cloud public issue tracker](https://issuetracker.google.com/issues/new?component=1871640) providing any additional details. – N Singh Oct 31 '17 at 20:55
  • All the paths will be the same length. I was relying on the IGP metric to give priority (which works just fine in a Cisco only world, apparently not here). But you mentioning the as-paths, pointed out the obvious solution, I'll just prepend to the routes I want de-prioritized. I'll test that tonight. Either way, I'll post on the issue tracker to see if that's the expected behavior going forward, or if that's a legitimate issue. – Bryan Vukich Nov 01 '17 at 22:07

1 Answers1

1

I worked around this issue with just a simple AS prepend to the "remote" routes on the outbound route map, approximately this:

router bgp 65001
 neighbor 169.254.x.y route-map DISTtoBGP out
!
route-map DISTtoBGP permit 10
 match ip address prefix-list LOCtoBGP
!
route-map DISTtoBGP permit 20
 match ip address prefix-list REMtoBGP
 set as-path prepend 65001
!
ip prefix-list LOCtoBGP seq 10 permit 10.1.0.0/16 le 24
!ip prefix-list LOCtoBGP ...
!
ip prefix-list REMtoBGP seq 10 permit 172.16.0.0/16 le 24
!ip prefix-list REMtoBGP ...

It's kludgy, and will need to be manually maintained, but it works.

Thank you Navi, your question about the AS path length pointed out the easy work around.