Ubuntu 22.04.03 LTS FRR BGP ECMP issue

Asked by Damian Zwiazek

I have configured FRR iBGP ECMP on Ubuntu Server 22.04.03 LTS.
When it works then on ubuntu cli I have that output:

ip route
default nhid 31 proto bgp metric 20
        nexthop via 10.122.18.24 dev vlan.100 weight 1
        nexthop via 10.122.18.26 dev vlan.101 weight 1
10.122.18.24/31 dev vlan.100 proto kernel scope link src 10.122.18.25
10.122.18.26/31 dev vlan.101 proto kernel scope link src 10.122.18.27

issue is come when some physical interface on computer will down and then up, then
I do not why ubuntu can not up default gateways in to routing table and it is look like below until I restart frr or perform
command clear ip bgp *

ip route
10.122.18.24/31 dev vlan.100 proto kernel scope link src 10.122.18.25
10.122.18.26/31 dev vlan.101 proto kernel scope link src 10.122.18.27

On FRR when issue is occuring and when all is working is the same
show ip route

B>* 0.0.0.0/0 [20/0] via 10.122.18.24, vlan.100, weight 1, 00:06:48
  * via 10.122.18.26, vlan.101, weight 1, 00:06:48
C>* 10.122.18.24/31 is directly connected, vlan.100, 00:28:52
C>* 10.122.18.26/31 is directly connected, vlan.101, 00:28:52

When I log in to FRR I see that there are outputs like should be, I suspect some issue between communication FRR and Kernel, but
I am not shore.

My config on FRR below:
mm-bgp-nat-obo# show running-config
Building configuration...

Current configuration:
!
frr version 8.1
frr defaults traditional
hostname frr-host
log syslog informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp 64513
 no bgp ebgp-requires-policy
 neighbor 10.122.18.24 remote-as 500
 neighbor 10.122.18.26 remote-as 500
 !
 address-family ipv4 unicast
  maximum-paths 2
  maximum-paths ibgp 2
 exit-address-family
exit
!
end

Somebody know what can fix it?
I performed the same configuration on Quagga on Ubuntu 20 LTS and it works.

Question information

Language:
English Edit question
Status:
Expired
For:
Ubuntu Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Bernard Stafford (bernard010) said :
#2
Revision history for this message
Damian Zwiazek (damian-zw) said :
#3

Thanks for response.

I am testing on fresh install Ubuntu 22.04.03 LTS with installed FRR 8.1 throught command ( apt install frr ).

Today I was plug out and plug in on for active network interface on this computer and in logs I have that info,
( the last one describing potential issue )

Dec 22 08:21:02 X staticd[1013]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Dec 22 08:21:02 X watchfrr[973]: [QDG3Y-BY5TN] zebra state -> up : connect succeeded
Dec 22 08:21:02 X watchfrr[973]: [QDG3Y-BY5TN] bgpd state -> up : connect succeeded
Dec 22 08:21:02 X watchfrr[973]: [QDG3Y-BY5TN] staticd state -> up : connect succeeded
Dec 22 08:21:02 X watchfrr[973]: [KWE5Q-QNGFC] all daemons up, doing startup-complete notify
Dec 22 08:21:02 X frrinit.sh[910]: * Started watchfrr
Dec 22 08:21:02 X systemd[1]: Started FRRouting.
Dec 22 08:21:04 X bgpd[1001]: [M59KS-A3ZXZ] bgp_update_receive: rcvd End-of-RIB for IPv4 Unicast from 10.122.18.24 in vrf default
Dec 22 08:21:04 X bgpd[1001]: [M59KS-A3ZXZ] bgp_update_receive: rcvd End-of-RIB for IPv4 Unicast from 10.122.18.26 in vrf default
Dec 22 08:27:26 X zebra[989]: [N2AGF-TSFB9][EC 4043309102] Kernel deleted a nexthop group with ID (18) that we are still using for a route, sending it back down

Revision history for this message
Mya Brownn (myabrownn) said :
#4

I appreciate the detailed information about your FRR iBGP ECMP configuration on Ubuntu Server 22.04.03 LTS. It seems like you're encountering an issue where the default gateways do not come back up in the routing table after a physical interface goes down and then up.

Based on the provided configuration and symptoms, it appears that there might be a synchronization issue between FRR and the Linux kernel. One solution you can try is to explicitly trigger a route refresh from FRR to update the kernel routing table. You can achieve this by configuring the BGP neighbor with the "soft-reconfiguration" option:

bash:
router bgp 64513
  neighbor 10.122.18.24 soft-reconfiguration inbound
  neighbor 10.122.18.26 soft-reconfiguration inbound

This option allows FRR to reapply inbound route policies without tearing down the BGP session. After making this change, observe if the issue persists when a physical interface goes down and up again.

If the problem persists, you can also try enabling the "bgp scan-time" option to reduce the time between periodic scans of BGP routes by the FRR daemon:

bash:
router bgp 64513
  bgp scan-time 5

This option sets the minimum time interval between consecutive scans for changes in the BGP routing table. Adjust the value as needed.

Finally, ensure that your FRR version is up to date, as newer versions may include bug fixes and improvements.

bash:
sudo apt update
sudo apt upgrade frr

Remember to test these changes in a controlled environment or during a maintenance window to avoid disrupting production traffic.

Revision history for this message
Damian Zwiazek (damian-zw) said :
#5

Thanks for reply.

Unfortunately, steps

router bgp 64513
  neighbor 10.122.18.24 soft-reconfiguration inbound
  neighbor 10.122.18.26 soft-reconfiguration inbound

and

sudo apt update
sudo apt upgrade frr

did not help

Command
bgp scan-time 5
is not present in my FRR

Revision history for this message
Bernard Stafford (bernard010) said (last edit ):
#6

There is Snap version 8.4.2 FRRouting a fork of Quagga available.
https://snapcraft.io/frr

https://docs.frrouting.org/en/latest/index.html

Revision history for this message
Damian Zwiazek (damian-zw) said :
#7

Thanks for replay.

I installed FRR 8.4.2 from Snap.

Now it looks like all time default gateway are not present in Routing Table
Restarting FRR services and whole hardware now it not helping to put default gateway paths to Routing Table.

Below are logs:

2023/12/22 15:38:24 BABELD: [WXJ8P-YNMM9] Terminating on signal
2023/12/22 15:38:24 BGP: [ZW1GY-R46JE] Terminating on signal
2023/12/22 15:38:24 EIGRP: [PAVNT-SE69J] Terminating on signal
2023/12/22 15:38:24 OPEN_FABRIC: [W899R-SKATM] Terminating on signal SIGTERM
2023/12/22 15:38:24 ISIS: [W899R-SKATM] Terminating on signal SIGTERM
2023/12/22 15:38:24 OSPF6: [K9AEX-91680] Terminating on signal SIGTERM
2023/12/22 15:38:24 PBR: [R2SSF-NKV5H] Terminating on signal
2023/12/22 15:38:24 LDP: SIGINT received
2023/12/22 15:38:24 VRRP: [N50WA-0KKX6] Terminating on signal
2023/12/22 15:38:24 LDP: waiting for children to terminate
2023/12/22 15:38:24 PATH: [TCME6-224AD] Terminating on signal
2023/12/22 15:38:24 PATH: [PKKEV-3K0XV] Unregisterfrom opaque,etc
2023/12/22 15:38:24 BABELD: [TXV3K-GQWTV][EC 100663303] creat(babel-state): Permission denied
2023/12/22 15:38:24 STATIC: [MRN6F-AYZC4] Terminating on signal
2023/12/22 15:38:24 OSPF: [W9T04-QWK6B] Terminating on signal
2023/12/22 15:38:24 RIP: [N4CEB-XCAK5] Terminating on signal
2023/12/22 15:38:24 RIPNG: [P9MRZ-AG6RD] Terminating on signal
2023/12/22 15:38:24 LDP: terminating
2023/12/22 15:38:24 LDP: [YAF85-253AP][EC 100663299] buffer_write: write error on fd 12: Broken pipe
2023/12/22 15:38:24 LDP: [X6B3Y-6W42R][EC 100663302] zclient_send_message: buffer_write failed to zclient fd 12, closing
2023/12/22 15:41:25 LDP: accept_add: accepting on fd 9
2023/12/22 15:41:25 LDP: Error connecting synchronous zclient!
2023/12/22 15:41:26 PATH: [NN4XW-E4M3V] IPv4 Router Id updated for VRF 0: 2.2.2.2
2023/12/22 15:41:28 BGP: [M59KS-A3ZXZ] bgp_update_receive: rcvd End-of-RIB for IPv4 Unicast from 10.122.18.24 in vrf default
2023/12/22 15:41:28 BGP: [M59KS-A3ZXZ] bgp_update_receive: rcvd End-of-RIB for IPv4 Unicast from 10.122.18.26 in vrf default

 show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

B>r 0.0.0.0/0 [20/0] via 10.122.18.24, vlan.100, weight 1, 00:02:33
  r via 10.122.18.26, vlan.101, weight 1, 00:02:33
C>* 2.2.2.2/32 is directly connected, lo, 00:02:35
C>* 10.122.18.24/31 is directly connected, vlan.100, 00:02:35
C>* 10.122.18.26/31 is directly connected, vlan.101, 00:02:35
C>* 192.168.0.0/24 is directly connected, enp2s0, 00:02:35

Revision history for this message
Damian Zwiazek (damian-zw) said :
#8

I was install and test FRR 7.2.1 on Ubuntu 20.04 LTS
and the same story, default gateways not insert in to Routing Table, only available on FRR

Below output from logs and from commands from linux and FRR.

Maybe I am doing something wrong in configuration ( config pasted below)?
When I install Quagga instead FRR all work perfect.

Dec 22 17:49:11 x bgpd[984]: %NOTIFICATION: rcvd End-of-RIB for IPv4 Unicast from 10.122.18.24 in vrf default
Dec 22 17:49:11 x bgpd[984]: %NOTIFICATION: rcvd End-of-RIB for IPv4 Unicast from 10.122.18.26 in vrf default
Dec 22 17:50:48 x bgpd[984]: %NOTIFICATION: rcvd End-of-RIB for IPv4 Unicast from 10.122.18.24 in vrf default
Dec 22 17:50:49 x bgpd[984]: %NOTIFICATION: rcvd End-of-RIB for IPv4 Unicast from 10.122.18.26 in vrf default

Current configuration:
!
frr version 7.2.1
frr defaults traditional
hostname x
log syslog informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp 64513
 neighbor 10.122.18.24 remote-as 500
 neighbor 10.122.18.26 remote-as 500
 !
 address-family ipv4 unicast
  neighbor 10.122.18.24 next-hop-self
  neighbor 10.122.18.26 next-hop-self
  maximum-paths 2
 exit-address-family

# show ip bgp
BGP table version is 7, local router ID is 192.168.0.1, vrf id 0
Default local pref 100, local AS 64513
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network Next Hop Metric LocPrf Weight Path
*= 0.0.0.0/0 10.122.18.24 0 500 i
*> 10.122.18.26 0 500 i

ip route
10.122.18.24/31 dev vlan.100 proto kernel scope link src 10.122.18.25
10.122.18.26/31 dev vlan.101 proto kernel scope link src 10.122.18.27

Revision history for this message
Bernard Stafford (bernard010) said :
#9

https://docs.frrouting.org/en/latest/babeld.html
"Babel is an interior gateway protocol that is suitable both for wired networks and for wireless mesh networks."
Try configuring babel. Enable on routing and on network IFNAME.
This might... give you your default gateway.

Revision history for this message
Damian Zwiazek (damian-zw) said :
#10

Tanks for reply.
Unfortunately, I am connected through BGP wih my ISP with two paths so I can not use Babel.

I found very similar issue, described here
https://github.com/FRRouting/frr/issues/14160

It looks like this issue is known within many version FRR.

I would like to use FRR with Ubuntu 22.04 but in this case when this isse is not fixed yet
I must come back to great Quagga and Ubuntu 20.04

Revision history for this message
Launchpad Janitor (janitor) said :
#12

This question was expired because it remained in the 'Open' state without activity for the last 15 days.