Dual WAN Debian Router Part One: “Static” Configuration

Background

My homelab is both quite overkill and very useful. Everything headless, including my router (which exists behind both ISP routers at this point, including double NAT), is some version of Debian (now bookworm). My router, a Qotom mini PC with 4 Intel GbE ports and 8 GiB of RAM (currently using 616 MiB and 1.3 GiB cache), is the subject of this series and currently my oldest device still in service. It currently hosts Caddy, a VPN home, optional privacy VPN outbound (both WireGuard), private DNS, an Yggdrasil node (which I mainly use, and have used, as a backup VPN), and some other miscellanea. It’s served me very well, even if mdadm over USB turned out to be a bad idea. (I now use a much bigger computer with real SATA for RAID, one perhaps even more overkill for its tasks, and which boots off of software RAID1 NVMe, which was a doozy to setup.)

My memory tells me I started using a Linux-based router for one major reason: I had a very, very slow uplink at the time (DSL) and CAKE promised to ameliorate that. Since upgrading my ISP, CAKE has honestly become less necessary, but I still use it (including “background”/CS1 QoS). However, using my own router has been great for controlling my internal network to a degree consumer ISP routers don’t generally allow—including combining two different ISPs (which became very appealing one recent night at about 8 p.m. when a salesman told me a new, unlimited 5G home service was available in the area, even though my main ISP is great… except when it’s down and I go searching for new services on my phone).

Since this is just Linux, you could do this on basically anything that can run Linux. Protectli look interesting and, if you don’t have large storage or other requirements, look enough to host most things by themselves (have never used one myself). I personally may end up replacing the Qotom with a 4x2.5 or 10GbE PCIe card in my primary NAS, which definitely has the room and power.

This guide/“report” will assume familiarity with networking, especially Linux networking, and general Linux CLI knowledge. Linux policy routing, which is instrumental to this approach, is built on a tower of other concepts. Links are provided throughout, and most terms should be searchable. It took me a lot of learning (and staring at man pages) to get to this point.

Goals

Required Software

At least, this is how I did it. Most of this should be in Debian and may even be included by default.

General Design

I have found bridges to be invaluable for network configurations. They basically act as an unmanaged switch (and are closer to managed switches in newer kernel versions) and you can put anything layer 2 in them (veth devices, real devices, vlan devices, some VPNs…). These instructions will use and create three main bridges: brlan, brwan1, and brwan2. You can definitely create and use more bridges if you need them!

We will use 3 connection marks: 0x1 (1) for “ISP1” (our primary ISP), 0x2 (2) for “ISP2” (our backup/secondary ISP), and 0x10 (16) for “movable”.

We will use three routing tables, which can be defined in /etc/iproute2/rt_tables:

99  dummy
101 wanisp1
102 wanisp2

Also tell systemd-networkd the names for the new routing tables:

# /etc/systemd/networkd.conf
# rest of file excluded for brevity
[Network]
RouteTable=dummy:99 wanisp1:101 wanisp2:102

wanisp1 and wanisp2 are where we will place the routes for ISP1 and ISP2. dummy is an odd hack, which I will explain later. (You can also call wanisp1/etc something more memorable, like wanispname.)

Most configuration in this guide could be used for more than 2 ISPs, if desired (including things like outbound VPNs).

LAN Configuration

systemd-networkd makes it pretty easy to make a bridge. First you need to create a bridge with a .netdev file in /etc/systemd/network/:

# /etc/systemd/network/brlan.netdev
[NetDev]
Name=brlan
Kind=bridge
You’ll then want to configure some kind of network for it (with a .network file in the same directory):
# /etc/systemd/network/brlan.network
[Match]
Name=brlan
[Network]
Address=10.100.0.1/24
IPForward=yes
And then you’ll want to attach at least one real device to it (eth0 is an example, use ip link or ip address to find the real name of your ethernet ports):
# /etc/systemd/network/eth0.network
[Match]
Name=eth0
[Network]
Bridge=brlan

You will probably want at least DHCP and DNS for your internal network. Here’s a quick dnsmasq example:

# /etc/dnsmasq.conf
interface=brlan
bind-dynamic
no-resolv

server=1.1.1.1
server=1.0.0.1

dhcp-range=10.100.0.100,10.100.0.200,5m

WAN Bridge Configuration

Consumer ISP routers are… inflexible. This section is going to assume that you are going to be able to get a static DHCP allocation, port forwarding if you want to expose services, and not much more than that (and even then, workarounds are possible). If you have the ability to set custom routes you might be able to avoid an additional NAT (though I personally haven’t had many issues with double NAT). If you have CGNAT, you are going to have to figure out another way (e.g., VPN to VPS) to get incoming connections working with that ISP.

brwan1.netdev and brwan2.netdev will be basically the same as the setup for brlan.netdev (with an optional MAC address, if you followed the optional section above). brwan1.network will be a bit more, however:

[Match]
Name=brwan1
[Network]
DHCP=ipv4
IPv6AcceptRA=no
[DHCPv4]
UseDNS=no
RouteTable=wanisp1
[DHCPv6]
UseDNS=no
[IPv6AcceptRA]
UseDNS=no
[RoutingPolicyRule]
Table=wanisp1
Priority=40001
FirewallMark=1
[RoutingPolicyRule]
Table=wanisp1
Priority=60001
FirewallMark=16

In summary, we only want the IP address and routes from the ISP. We don’t want their DNS. We also don’t want IPv6 (for this article). We want to put these routes in their own routing table, and not in the main routing table. We also want two new routing policy rules. After enabling brwan1.network, the output of ip rule will look like this (comments added for explanation):

# first 3 rules here are default to Linux
0:      from all lookup local
32766:  from all lookup main
32767:  from all lookup default
# rules are executed from lower to higher number
# after the default rules, we say “anything with a fwmark of 0x1 (ISP1 only)
# goes to routing table wanisp1”
# I set the priorities to be after the default rules so that I
# don’t have to be super strict about not setting fwmarks for local traffic
40001:  from all fwmark 0x1 lookup wanisp1 proto static
# “anything with fwmark 0x10 (anything ‘movable’) goes to wanisp1”
60001:  from all fwmark 0x10 lookup wanisp1 proto static

brwan2.network/etc will be very similar, with a couple of changes:

  1. use table wanisp2, not wanisp1
  2. increment the priority for the first policy rule. It’s allowed to have multiple policy rules at the same priority, but it’s easier if you don’t.
  3. don’t include the second rule (the priority 60001 rule), we only want it once. (When it comes to dynamic failover, we will create a dynamic rule at priority 50000 that sends traffic to the table we want.)

nftables Configuration

Currently, nothing works yet. We need something to set the fwmarks or they won’t do anything useful. (If you try to ping 1.1.1.1, it’ll just error out or do nothing.) If you are unfamiliar with nftables, both Gentoo and Arch have good examples. Instead of trying to go back and forth, I’ll explain this section with a heavily commented /etc/nftables.conf file:

#!/usr/sbin/nft -f

# this deletes the entire firewall. One of the nice improvements
# of nftables over iptables is we get to load the firewall atomically, so
# if your changes fail to load, you’ll just be stuck with your old firewall.
# If, however, you need to share your firewall with another program, you may
# need to be more specific about what you delete/recreate.
flush ruleset

# we can define “variables” in nftables.conf. These won’t show when
# you run `nft list ruleset` because they’ll be replaced by that point.
# If your local IPs are not constant (because your ISP router won’t let
# you have a static DHCP reservation or similar), you can use an nftables
# named set and a script to update that set, but otherwise use a similar
# structure (same for the local routes)
define isp1_local_ip = 192.168.1.10
define isp1_local_subnet = 192.168.1.10/24
define isp2_local_ip = 192.168.2.10
define isp2_local_subnet = 192.168.2.10/24
define isp_local_ips = {$isp1_local_ip, $isp2_local_ip}
# NOTE: if you are unlucky enough that both ISP subnets
# are the same, and they refuse to budge, it can still be handled,
# but you might have some work ahead of you.

# we can also define our connection marks
define CMARK_ISP1 = 0x1
define CMARK_ISP2 = 0x2
define CMARK_MOVABLE = 0x10

# we can also define our interfaces
define wan_isp1 = "brwan1"
define wan_isp2 = "brwan2"
# and also define a set of interfaces
define wan_iifnames = {$wan_isp1,$wan_isp2}

# the “mangle” table, here we can set connection marks
table inet mangle {
    # forwarded traffic is sent here when incoming on an interface,
    # such as “brlan” or “brwan2”
    chain prerouting {
        type filter hook prerouting priority mangle; policy accept;

        ct state new jump set_connmark

        # Policy routing (aka `ip rule`) only works on packet marks (fwmarks),
        # so we need to copy the connmark (“connection mark”) to the packet
        meta mark set ct mark
    }

    # this processes traffic from the local machine, before routing
    chain output {
        type route hook output priority mangle; policy accept;

        ct state new jump set_connmark

        meta mark set ct mark
    }

    # sweet! we get our own chain, here we set policy
    chain set_connmark {
        # if traffic comes in ISP1 (such as HTTPS traffic), send it back
        iifname $wan_isp1 ct mark set $CMARK_ISP1 return
        # ditto for ISP2
        iifname $wan_isp2 ct mark set $CMARK_ISP2 return

        # traffic bound to brwan1 needs to go out brwan1, this will let
        # stuff like `ping -I brwan1` and `curl --interface brwan1` work
        # note that iifname doesn’t work here, because Linux is weird and IP
        # addresses both belong and don’t belong to interfaces
        ip saddr $isp1_local_ip ct mark set $CMARK_ISP1 return
        # let traffic go to the ISP1 subnet, maybe you want to access
        # the router page
        ip daddr $isp1_local_subnet ct mark set $CMARK_ISP1 return
        # ditto for ISP2
        ip saddr $isp2_local_ip ct mark set $CMARK_ISP2 return
        ip daddr $isp2_local_subnet ct mark set $CMARK_ISP2 return

        # maybe you have a device that uses a lot of traffic
        # that you don’t want to failover to ISP2 because data is expensive,
        # you can force it to stay with a rule like this
        ip saddr 10.100.0.53 ct mark set $CMARK_ISP1 return

        # Set all UDP traffic as “movable”. This is probably unideal,
        # but it does match all WireGuard and QUIC traffic.
        # UDP traffic needs to handle this case anyway (e.g.,
        # a phone moving networks), so at worst maybe some UDP protocol
        # gets confused and retries.
        ip protocol udp ct mark set $CMARK_MOVABLE return

        jump set_connmark_dynamic
    }

    # use a different chain so we can
    # replace it in a script
    chain set_connmark_dynamic {
        # set all traffic to go down ISP1, when we fail over
        # we will replace this with CMARK_ISP2
        ct mark set $CMARK_ISP1
    }
}

# Example of pretty standard filter tables, put your desired rules here
table inet filter {
    chain input {
        type filter hook input priority 0; policy drop;
        ct state invalid drop

        iif lo accept
        iif != lo ip daddr 127.0.0.0/8 drop
        iif != lo ip6 daddr ::1/128 drop

        ct state {established, related} accept

        ip protocol icmp accept
        ip6 nexthdr icmpv6 accept

        # Not needed if you configure your WAN IPs with static IPs and don’t
        # use DHCP
        iifname $wan_iifnames udp dport 68 accept comment "accept DHCP"

        iifname "brlan" udp dport {53,67,123} accept \
            comment "DNS, DHCP, NTP"
        iifname "brlan" tcp dport {22,53} accept comment "SSH, DNS"

        # If you have an HTTP reverse proxy, you may want these
        udp dport {443} accept comment "https (QUIC)"
        tcp dport {80, 443} accept comment "http (TCP)"

        # Make nmap’s job harder
        iifname $wan_iifnames drop comment "drop unmatched public"
        # Make your job easier
        reject
    }

    chain forward {
        type filter hook forward priority 0; policy drop;
        ct state invalid drop
        ct state {established, related} accept

        # Let everything out outbound, woo-hoo!
        iifname "brlan" oifname $wan_iifnames accept "outbound traffic"

        # port forwards (in the nat prerouting table) also need a rule here
        iifname $wan_iifnames oifname "brlan" \
            ip daddr 10.100.0.53 tcp dport 22000 accept

        reject
    }

    chain output {
        type filter hook output priority 0; policy accept;
    }
}

table inet nat {
    chain prerouting {
        type nat hook prerouting priority 0; policy accept;

        # This table is where you would put in port forwarding rules,
        # remember to also allow it in the inet filter forward table
        tcp dport 22000 ip daddr $isp_local_ips dnat ip to 10.100.0.53:22000
    }
    chain postrouting {
        type nat hook postrouting priority srcnat; policy accept;

        # NAT everything going to an ISP
        oifname $wan_iifnames masquerade
    }
}

Getting Traffic from Localhost to Work

So at this point the following should be working:

However, a plain curl myip.wtf/json or ping 1.1.1.1 is probably giving you a very strange error:

$ ping 1.1.1.1
ping: connect: Network is unreachable

Ugh, the network’s right there! We are even handling you in the mangle output table! So it turns out that traffic from the local machine has a consideration that forwarded traffic does not: source address selection. I don’t understand this entirely myself, but the process goes something like this:

  1. A process does a connect or similar without specifying a source IP address, so the kernel needs to find one.
  2. The kernel goes through the policy and routing table rules, but notably, does not go through nftables or fwmarks at all, until it finds an appropriate rule matching the destination.
  3. It then either takes the source address straight from the rule, or takes an address from the interface the rule is sending traffic down, and gives this source address to the application.

You can kind of simulate this process with a plain ip route get:

$ ip route get 1.1.1.1
RTNETLINK answers: Network is unreachable
$ ip route get 1.1.1.1 mark 1
1.1.1.1 via 192.168.1.1 dev brwan1 table wanisp1 src 192.168.1.10 mark 1 uid 1000
    cache

And yup, the kernel’s got nothing. But we want to use fwmarks! Well, I have a solution for you:

# /etc/systemd/network/dummy0.netdev
[NetDev]
Name=dummy0
Kind=dummy
# /etc/systemd/network/dummy0.network
[Match]
Name=dummy0

[Network]
# this address actually doesn’t matter much, as long as it doesn’t conflict
Address=172.16.7.3/32

[Route]
# we accept all traffic!
Destination=0.0.0.0/0
# remember that dummy table earlier?
Table=dummy

[RoutingPolicyRule]
Table=dummy
# but we have lower priority than anything else,
# so nothing should actually send traffic here
Priority=1000000

So a dummy device is… kinda like “a second localhost” actually. It only exists on the local machine (though if the firewall allows it traffic can be sent to it from outside, so it’s not quite like localhost) and doesn’t actually send traffic anywhere unless someone is listening to an IP address attached to it (listening on 0.0.0.0 counts). What’s important is we can “send” routes to it, so now the kernel can finally get a source address, which doesn’t actually matter because we generally NAT it away anyway:

$ ip route get 1.1.1.1
1.1.1.1 dev dummy0 table dummy src 172.16.7.3 uid 1000
    cache
$ ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=55 time=50.0 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=55 time=50.0 ms
...

WireGuard Inbound Is Odd

QUIC inbound should be working at this point. I think most UDP inbound protocols should work too, even if they are listening to 0.0.0.0 (Caddy works). However, if you have a WireGuard “server” on the router itself, it does something very odd: it seems that instead of responding with the address that the external device attempted to connect to, it instead responds with a source address of the dummy device we made earlier, gets NAT’ed improperly, does not get matched up properly with connmark, and is never able to properly respond. Even dnat’ing to a different port on localhost didn’t fix this. I was able to solve it by dnat’ing to “a different machine”, and the nice thing about WireGuard is we can put the listening port in a different network namespace than the actual interface. The solution is something like this:

# /etc/systemd/system/wghome.service
[Unit]
Description=Manages wghome
Requires=network.target
After=network.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/setup_wghome.sh

[Install]
WantedBy=multi-user.target

# /usr/local/bin/setup_wghome.sh
#!/bin/bash

set -e

ip link del wghome || true
# netns deletions seem to not finish immediately after
# `ip netns del`, so wait a second for it to fully delete
(ip netns del wghome && sleep 1) || true

ip netns add wghome
ip -netns wghome link set lo up
ip link add p-wghome type veth peer name host0 netns wghome
ip -netns wghome link set host0 up
ip -netns wghome address add 172.16.8.2/24 dev host0
ip address add 172.16.8.1/24 dev p-wghome
ip link set p-wghome up
ip -netns wghome route add default via 172.16.8.1 dev host0
# the magic line: we create the wghome interface in the
# default network namespace (aka the network namespace of PID 1/init),
# while ensuring its socket (*:51820) is in the wghome network namespace
ip -netns wghome link add wghome netns 1 type wireguard
ip address add 192.168.4.1/24 dev wghome
# I set the MTU to the lowest allowed because of issues I’ve had
# with hotel Wi-Fi
ip link set dev wghome mtu 1280
# you can’t set everything you would with wg-quick here,
# make sure to comment those lines out and set them elsewhere
wg setconf wghome /etc/wireguard/wghome.conf
# make sure to turn it on or it won’t work
ip link set wghome up

And then you can forward 51820 to 172.16.8.1 and the connection tracking now works properly, and if the client moves ISPs (either by script or just manually updating the peer IP) it moves to the new ISP.

Thoughts on DNS

As is, if you have some automated method of detecting and executing failover that doesn’t rely on DNS, the current method is okay. Supporting DNS during failovers would enhance network resilience (for example, Yggdrasil’s interface bound connections use DNS to find their public peer). I haven’t implemented these yet, but I have some thoughts:

  1. Add a bunch of servers to dnsmasq.conf, and policy route DNS servers to specific ISPs (e.g., send 1.1.1.1 to brwan1, and 1.0.0.1 to brwan2). This doesn’t really “failover”— dnsmasq will just use whatever—but it gets the job done. You can also have dnsmasq bind to specific interfaces as well (though I don’t think this implements bind-dynamic behavior, so you may need to test this more thoroughly and/or use statically configured IPs instead of DHCP for brwan*).
  2. If you truly want failover, you could use strict-order. I’m not the biggest fan of this option.
  3. I would like DNS over HTTPS anyway, maybe there’s a converter that could be made dual-WAN aware. (A dual-WAN aware SOCKS5 proxy would be nice too.)

Manual Failover

We have almost everything we need to execute a failover, at least manually. I’ll explain in the following script; if you used a different configuration edit it accordingly.

#!/bin/bash

set -e

if [ "$EUID" -ne 0 ]; then
        echo "This script must be run as root"
        exit 1
fi

if [ "$#" -ne 1 ]; then
        echo "Usage: $0 <argument>"
        exit 1
fi

# We want our dynamic rule’s priority to be before
# the one set in brwan1.network, which is our “default”
PRIORITY=50000

case "$1" in
        isp1)
                echo "Switching to ISP1..."
                # remove the rule, if it exists
                ip rule del priority "$PRIORITY" || true
                # “migrate” our migratory connections
                ip rule add priority "$PRIORITY" fwmark 0x10 lookup wanisp1
                # set how new connections are to be handled
                nft -f - << 'EOF'
flush chain inet mangle set_connmark_dynamic
add rule inet mangle set_connmark_dynamic ct mark set 0x1
EOF
                ;;
        isp2)
                echo "Switching to ISP2..."
                ip rule del priority "$PRIORITY" || true
                ip rule add priority "$PRIORITY" fwmark 0x10 lookup wanisp2
                nft -f - << 'EOF'
flush chain inet mangle set_connmark_dynamic
add rule inet mangle set_connmark_dynamic ct mark set 0x2
EOF
                ;;
        *)
                echo "Invalid argument. Please use 'isp1' or 'isp2'."
                exit 1
                ;;
esac

Save this as /usr/local/bin/switch_isp.sh (remember to chmod +x) and then you should be able to run sudo switch_isp.sh isp1 and sudo switch_isp.sh isp2 to automatically migrate existing UDP connections (such as QUIC or WireGuard) to the new ISP and set new TCP connections to go out the newly selected ISP, leaving old TCP connections alone.

Conclusions

So I think that was enough for part one. We got inbound connections to work (though admittedly ignored dynamic DNS). We have a setup that dual-WAN aware software can use (like Yggdrasil). We even have manually-initiated failover that doesn’t break existing connections if, for instance, the failover was premature. We also have a lot of flexibility here:

Some further improvements I plan on exploring (no timelines expressed or implied, but this blog supports RSS):