Dual WAN Debian Router Part One: “Static” Configuration
Background
My homelab is both quite overkill and very useful. Everything headless, including my router (which exists behind both ISP routers at this point, including double NAT), is some version of Debian (now bookworm). My router, a Qotom mini PC with 4 Intel GbE ports and 8 GiB of RAM (currently using 616 MiB and 1.3 GiB cache), is the subject of this series and currently my oldest device still in service. It currently hosts Caddy, a VPN home, optional privacy VPN outbound (both WireGuard), private DNS, an Yggdrasil node (which I mainly use, and have used, as a backup VPN), and some other miscellanea. It’s served me very well, even if mdadm over USB turned out to be a bad idea. (I now use a much bigger computer with real SATA for RAID, one perhaps even more overkill for its tasks, and which boots off of software RAID1 NVMe, which was a doozy to setup.)
My memory tells me I started using a Linux-based router for one major reason: I had a very, very slow uplink at the time (DSL) and CAKE promised to ameliorate that. Since upgrading my ISP, CAKE has honestly become less necessary, but I still use it (including “background”/CS1 QoS). However, using my own router has been great for controlling my internal network to a degree consumer ISP routers don’t generally allow—including combining two different ISPs (which became very appealing one recent night at about 8 p.m. when a salesman told me a new, unlimited 5G home service was available in the area, even though my main ISP is great… except when it’s down and I go searching for new services on my phone).
Since this is just Linux, you could do this on basically anything that can run Linux. Protectli look interesting and, if you don’t have large storage or other requirements, look enough to host most things by themselves (have never used one myself). I personally may end up replacing the Qotom with a 4x2.5 or 10GbE PCIe card in my primary NAS, which definitely has the room and power.
This guide/“report” will assume familiarity with networking, especially Linux networking, and general Linux CLI knowledge. Linux policy routing, which is instrumental to this approach, is built on a tower of other concepts. Links are provided throughout, and most terms should be searchable. It took me a lot of learning (and staring at man pages) to get to this point.
Goals
- Traffic incoming on an ISP should go back out that ISP. This will let us expose services to the wider world (such as a VPN or reverse proxy setup to access self-hosted applications).
- It should be possible to mark traffic as going out a specific ISP. This will be useful for software that supports dual WAN (for example Yggdrasil InterfacePeers), but also (in a future post) be useful for measuring the state of each individual ISP and updating a dynamic DNS record for each ISP and for the “best” ISP.
- “False” failovers should cause minimal disruption, allowing for quicker recovery. (We can accomplish this by marking connections as “belonging” to a specific ISP if they don’t support migrating IP addresses, like most TCP—currently ignoring Multipath TCP, which doesn’t have a lot of server support at this time.)
- Some traffic that can handle it (including WireGuard and QUIC/HTTP3) should be allowed to migrate, despite breaking the normal network 4-tuple (source IP, source port, destination IP, destination port) in the process. We especially need this because of the last point—if we mark an “outbound” WireGuard connection as belonging to a specific ISP, it will never attempt reconnection in a way that breaks the 4-tuple and end up forever destined to sit on a broken route (at least until that ISP comes back online).
- It should be simple… not necessarily in a “fancy package” way, but in a “I understand the pieces and can rearrange them to my pleasing” sort of way.
- It should work with general consumer ISP routers as they exist today, which are not known for great configurability. In my current setup, although my secondary ISP is a 5G service, both ISPs at least have their own public IPs and port forwarding for IPv4. I won’t be avoiding double NAT (triple NAT if you have CGNAT). In theory there are approaches where you can use a VPN (like OpenMPTCProuter) to a VPS to allow failover without breaking TCP connections and even combine the bandwidth of the ISPs, but I will be avoiding this because so many websites block data center IPs.
- Non goal (for this article): actually doing any sort of automated failover. That will (hopefully) be in part two.
- Non goal: IPv6. I need to figure this out more, and replace microsocks with something that, at minimum, supports Happy Eyeballs. (I’ve had to interact with at least one government(!) website that had broken AAAA records, and that was definitely not fun to figure out.)
Required Software
At least, this is how I did it. Most of this should be in Debian and may even be included by default.
- nftables: the Linux firewall, also known as the modern, nicer alternative to iptables. (In modern distributions iptables ends up a wrapper for nftables anyway.)
- iproute2 (a nice cheatsheet): the modern way to investigate and runtime-configure most linux networking, though note that nothing here survives a reboot.
- conntrack: invaluable. Will show you your current connections, their marks, how NAT is affecting them, and let you clear (parts of) your current connection table.
- tcpdump: a super useful tool to let you watch traffic and see what it’s doing.
- systemd-networkd: part of the systemd suite. In this guide I will be using this (and sometimes systemd itself) to persist network configurations. I also use systemd-resolved for local DNS service, though you could definitely replace that.
- dnsmasq: in simple cases you might even be able to replace this with systemd-networkd, but it’s generally useful.
- Yggdrasil: not needed at all, but a cool research project and an easy way to get a free dual WAN-aware port-forwarding unneeded way into your network. Do be aware that the IPv6 address it gives you is practically a public IP: configure your firewall accordingly.
General Design
I have found bridges
to be invaluable for network configurations. They basically act as an unmanaged switch (and are closer to managed switches
in newer kernel versions) and you can put anything layer 2 in them
(veth devices, real devices, vlan devices, some VPNs…).
These instructions will use and create three main bridges:
brlan
, brwan1
, and brwan2
. You can definitely create and use more bridges if you need them!
We will use 3 connection marks: 0x1
(1
) for “ISP1” (our primary ISP), 0x2
(2
) for “ISP2” (our backup/secondary ISP),
and 0x10
(16
) for “movable”.
We will use three routing tables, which can be defined in /etc
:
99 dummy 101 wanisp1 102 wanisp2
Also tell systemd-networkd the names for the new routing tables:
# /etc/systemd/networkd.conf # rest of file excluded for brevity [Network] RouteTable=dummy:99 wanisp1:101 wanisp2:102
wanisp1
and wanisp2
are where we will place the routes for ISP1 and ISP2.
dummy
is an odd hack, which I will explain later. (You can also call wanisp1
/etc
something more memorable, like wanispname
.)
Most configuration in this guide could be used for more than 2 ISPs, if desired (including things like outbound VPNs).
LAN Configuration
systemd-networkd makes it pretty easy to make a bridge. First you need to create a bridge with a
.netdev file in
/etc
:
# /etc/systemd/network/brlan.netdev [NetDev] Name=brlan Kind=bridgeYou’ll then want to configure some kind of network for it (with a .network file in the same directory):
# /etc/systemd/network/brlan.network [Match] Name=brlan [Network] Address=10.100.0.1/24 IPForward=yesAnd then you’ll want to attach at least one real device to it (
eth0
is an example,
use ip link
or ip address
to find the real name of your ethernet ports):
# /etc/systemd/network/eth0.network [Match] Name=eth0 [Network] Bridge=brlan
You will probably want at least DHCP and DNS for your internal network. Here’s a quick dnsmasq example:
# /etc/dnsmasq.conf interface=brlan bind-dynamic no-resolv server=1.1.1.1 server=1.0.0.1 dhcp-range=10.100.0.100,10.100.0.200,5m
WAN Bridge Configuration
Consumer ISP routers are… inflexible. This section is going to assume that you are going to be able to get a static DHCP allocation, port forwarding if you want to expose services, and not much more than that (and even then, workarounds are possible). If you have the ability to set custom routes you might be able to avoid an additional NAT (though I personally haven’t had many issues with double NAT). If you have CGNAT, you are going to have to figure out another way (e.g., VPN to VPS) to get incoming connections working with that ISP.
brwan1.netdev
and brwan2.netdev
will be basically the same as the setup
for brlan.netdev
(with an optional MAC address, if you followed the optional section above).
brwan1.network
will be a bit more, however:
[Match] Name=brwan1 [Network] DHCP=ipv4 IPv6AcceptRA=no [DHCPv4] UseDNS=no RouteTable=wanisp1 [DHCPv6] UseDNS=no [IPv6AcceptRA] UseDNS=no [RoutingPolicyRule] Table=wanisp1 Priority=40001 FirewallMark=1 [RoutingPolicyRule] Table=wanisp1 Priority=60001 FirewallMark=16
In summary, we only want the IP address and routes from the ISP. We don’t want their DNS. We also
don’t want IPv6 (for this article). We want to put these routes in their own routing table,
and not in the main routing table. We also want two new routing policy rules. After enabling
brwan1.network
, the output of ip rule
will look like this (comments added for explanation):
# first 3 rules here are default to Linux 0: from all lookup local 32766: from all lookup main 32767: from all lookup default # rules are executed from lower to higher number # after the default rules, we say “anything with a fwmark of 0x1 (ISP1 only) # goes to routing table wanisp1” # I set the priorities to be after the default rules so that I # don’t have to be super strict about not setting fwmarks for local traffic 40001: from all fwmark 0x1 lookup wanisp1 proto static # “anything with fwmark 0x10 (anything ‘movable’) goes to wanisp1” 60001: from all fwmark 0x10 lookup wanisp1 proto static
brwan2.network
/etc will be very similar, with a couple of changes:
- use table
wanisp2
, notwanisp1
- increment the priority for the first policy rule. It’s allowed to have multiple policy rules at the same priority, but it’s easier if you don’t.
- don’t include the second rule (the priority 60001 rule), we only want it once. (When it comes to dynamic failover, we will create a dynamic rule at priority 50000 that sends traffic to the table we want.)
nftables Configuration
Currently, nothing works yet. We need something to set the fwmarks or they won’t do anything useful.
(If you try to ping 1.1.1.1
, it’ll just error out or do nothing.) If you are unfamiliar
with nftables, both Gentoo and
Arch have good examples.
Instead of trying to go back and forth, I’ll explain this section with a heavily commented
/etc/nftables.conf
file:
#!/usr/sbin/nft -f # this deletes the entire firewall. One of the nice improvements # of nftables over iptables is we get to load the firewall atomically, so # if your changes fail to load, you’ll just be stuck with your old firewall. # If, however, you need to share your firewall with another program, you may # need to be more specific about what you delete/recreate. flush ruleset # we can define “variables” in nftables.conf. These won’t show when # you run `nft list ruleset` because they’ll be replaced by that point. # If your local IPs are not constant (because your ISP router won’t let # you have a static DHCP reservation or similar), you can use an nftables # named set and a script to update that set, but otherwise use a similar # structure (same for the local routes) define isp1_local_ip = 192.168.1.10 define isp1_local_subnet = 192.168.1.10/24 define isp2_local_ip = 192.168.2.10 define isp2_local_subnet = 192.168.2.10/24 define isp_local_ips = {$isp1_local_ip, $isp2_local_ip} # NOTE: if you are unlucky enough that both ISP subnets # are the same, and they refuse to budge, it can still be handled, # but you might have some work ahead of you. # we can also define our connection marks define CMARK_ISP1 = 0x1 define CMARK_ISP2 = 0x2 define CMARK_MOVABLE = 0x10 # we can also define our interfaces define wan_isp1 = "brwan1" define wan_isp2 = "brwan2" # and also define a set of interfaces define wan_iifnames = {$wan_isp1,$wan_isp2} # the “mangle” table, here we can set connection marks table inet mangle { # forwarded traffic is sent here when incoming on an interface, # such as “brlan” or “brwan2” chain prerouting { type filter hook prerouting priority mangle; policy accept; ct state new jump set_connmark # Policy routing (aka `ip rule`) only works on packet marks (fwmarks), # so we need to copy the connmark (“connection mark”) to the packet meta mark set ct mark } # this processes traffic from the local machine, before routing chain output { type route hook output priority mangle; policy accept; ct state new jump set_connmark meta mark set ct mark } # sweet! we get our own chain, here we set policy chain set_connmark { # if traffic comes in ISP1 (such as HTTPS traffic), send it back iifname $wan_isp1 ct mark set $CMARK_ISP1 return # ditto for ISP2 iifname $wan_isp2 ct mark set $CMARK_ISP2 return # traffic bound to brwan1 needs to go out brwan1, this will let # stuff like `ping -I brwan1` and `curl --interface brwan1` work # note that iifname doesn’t work here, because Linux is weird and IP # addresses both belong and don’t belong to interfaces ip saddr $isp1_local_ip ct mark set $CMARK_ISP1 return # let traffic go to the ISP1 subnet, maybe you want to access # the router page ip daddr $isp1_local_subnet ct mark set $CMARK_ISP1 return # ditto for ISP2 ip saddr $isp2_local_ip ct mark set $CMARK_ISP2 return ip daddr $isp2_local_subnet ct mark set $CMARK_ISP2 return # maybe you have a device that uses a lot of traffic # that you don’t want to failover to ISP2 because data is expensive, # you can force it to stay with a rule like this ip saddr 10.100.0.53 ct mark set $CMARK_ISP1 return # Set all UDP traffic as “movable”. This is probably unideal, # but it does match all WireGuard and QUIC traffic. # UDP traffic needs to handle this case anyway (e.g., # a phone moving networks), so at worst maybe some UDP protocol # gets confused and retries. ip protocol udp ct mark set $CMARK_MOVABLE return jump set_connmark_dynamic } # use a different chain so we can # replace it in a script chain set_connmark_dynamic { # set all traffic to go down ISP1, when we fail over # we will replace this with CMARK_ISP2 ct mark set $CMARK_ISP1 } } # Example of pretty standard filter tables, put your desired rules here table inet filter { chain input { type filter hook input priority 0; policy drop; ct state invalid drop iif lo accept iif != lo ip daddr 127.0.0.0/8 drop iif != lo ip6 daddr ::1/128 drop ct state {established, related} accept ip protocol icmp accept ip6 nexthdr icmpv6 accept # Not needed if you configure your WAN IPs with static IPs and don’t # use DHCP iifname $wan_iifnames udp dport 68 accept comment "accept DHCP" iifname "brlan" udp dport {53,67,123} accept \ comment "DNS, DHCP, NTP" iifname "brlan" tcp dport {22,53} accept comment "SSH, DNS" # If you have an HTTP reverse proxy, you may want these udp dport {443} accept comment "https (QUIC)" tcp dport {80, 443} accept comment "http (TCP)" # Make nmap’s job harder iifname $wan_iifnames drop comment "drop unmatched public" # Make your job easier reject } chain forward { type filter hook forward priority 0; policy drop; ct state invalid drop ct state {established, related} accept # Let everything out outbound, woo-hoo! iifname "brlan" oifname $wan_iifnames accept "outbound traffic" # port forwards (in the nat prerouting table) also need a rule here iifname $wan_iifnames oifname "brlan" \ ip daddr 10.100.0.53 tcp dport 22000 accept reject } chain output { type filter hook output priority 0; policy accept; } } table inet nat { chain prerouting { type nat hook prerouting priority 0; policy accept; # This table is where you would put in port forwarding rules, # remember to also allow it in the inet filter forward table tcp dport 22000 ip daddr $isp_local_ips dnat ip to 10.100.0.53:22000 } chain postrouting { type nat hook postrouting priority srcnat; policy accept; # NAT everything going to an ISP oifname $wan_iifnames masquerade } }
Getting Traffic from Localhost to Work
So at this point the following should be working:
- Traffic from
brlan
to WAN, such asping 1.1.1.1
- Traffic from the router that is bound to a WAN IP, like
curl --interface brwan2 --resolve myip.wtf:80:65.108.75.112 myip.wtf/json
orping -I brwan2 1.1.1.1
(or the same, with the interface name replaced with the interface IP address) (you may need to update the--resolve
ifmyip.wtf
has moved since I’ve written this article)
However, a plain curl myip.wtf/json
or ping 1.1.1.1
is probably giving you
a very strange error:
$ ping 1.1.1.1 ping: connect: Network is unreachable
Ugh, the network’s right there! We are even handling you in the mangle output table! So it turns out that traffic from the local machine has a consideration that forwarded traffic does not: source address selection. I don’t understand this entirely myself, but the process goes something like this:
- A process does a
connect
or similar without specifying a source IP address, so the kernel needs to find one. - The kernel goes through the policy and routing table rules, but notably, does not go through nftables or fwmarks at all, until it finds an appropriate rule matching the destination.
- It then either takes the source address straight from the rule, or takes an address from the interface the rule is sending traffic down, and gives this source address to the application.
You can kind of simulate this process with a plain ip route get
:
$ ip route get 1.1.1.1 RTNETLINK answers: Network is unreachable $ ip route get 1.1.1.1 mark 1 1.1.1.1 via 192.168.1.1 dev brwan1 table wanisp1 src 192.168.1.10 mark 1 uid 1000 cache
And yup, the kernel’s got nothing. But we want to use fwmarks! Well, I have a solution for you:
# /etc/systemd/network/dummy0.netdev [NetDev] Name=dummy0 Kind=dummy # /etc/systemd/network/dummy0.network [Match] Name=dummy0 [Network] # this address actually doesn’t matter much, as long as it doesn’t conflict Address=172.16.7.3/32 [Route] # we accept all traffic! Destination=0.0.0.0/0 # remember that dummy table earlier? Table=dummy [RoutingPolicyRule] Table=dummy # but we have lower priority than anything else, # so nothing should actually send traffic here Priority=1000000
So a dummy device is… kinda like “a second localhost” actually. It only exists on the local machine (though if the firewall allows it traffic can be sent to it from outside, so it’s not quite like localhost) and doesn’t actually send traffic anywhere unless someone is listening to an IP address attached to it (listening on 0.0.0.0 counts). What’s important is we can “send” routes to it, so now the kernel can finally get a source address, which doesn’t actually matter because we generally NAT it away anyway:
$ ip route get 1.1.1.1 1.1.1.1 dev dummy0 table dummy src 172.16.7.3 uid 1000 cache $ ping 1.1.1.1 PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data. 64 bytes from 1.1.1.1: icmp_seq=1 ttl=55 time=50.0 ms 64 bytes from 1.1.1.1: icmp_seq=2 ttl=55 time=50.0 ms ...
WireGuard Inbound Is Odd
QUIC inbound should be working at this point. I think most UDP
inbound protocols should work too, even if they are listening to 0.0.0.0
(Caddy works).
However, if you have a WireGuard “server” on the router itself, it does something very odd:
it seems that instead of responding with the address that the external device attempted to connect to,
it instead responds with a source address of the dummy device we made earlier, gets NAT’ed improperly,
does not get matched up properly with connmark, and is never able to properly respond.
Even dnat
’ing to a different port on localhost didn’t fix this. I was able to solve it
by dnat
’ing to “a different machine”, and the nice thing about WireGuard is we can put
the listening port in a different network namespace
than the actual interface. The solution is something like this:
# /etc/systemd/system/wghome.service [Unit] Description=Manages wghome Requires=network.target After=network.target [Service] Type=oneshot ExecStart=/usr/local/bin/setup_wghome.sh [Install] WantedBy=multi-user.target # /usr/local/bin/setup_wghome.sh #!/bin/bash set -e ip link del wghome || true # netns deletions seem to not finish immediately after # `ip netns del`, so wait a second for it to fully delete (ip netns del wghome && sleep 1) || true ip netns add wghome ip -netns wghome link set lo up ip link add p-wghome type veth peer name host0 netns wghome ip -netns wghome link set host0 up ip -netns wghome address add 172.16.8.2/24 dev host0 ip address add 172.16.8.1/24 dev p-wghome ip link set p-wghome up ip -netns wghome route add default via 172.16.8.1 dev host0 # the magic line: we create the wghome interface in the # default network namespace (aka the network namespace of PID 1/init), # while ensuring its socket (*:51820) is in the wghome network namespace ip -netns wghome link add wghome netns 1 type wireguard ip address add 192.168.4.1/24 dev wghome # I set the MTU to the lowest allowed because of issues I’ve had # with hotel Wi-Fi ip link set dev wghome mtu 1280 # you can’t set everything you would with wg-quick here, # make sure to comment those lines out and set them elsewhere wg setconf wghome /etc/wireguard/wghome.conf # make sure to turn it on or it won’t work ip link set wghome up
And then you can forward 51820 to 172.16.8.1
and the connection tracking
now works properly, and if the client moves ISPs (either by script or just manually
updating the peer IP) it moves to the new ISP.
Thoughts on DNS
As is, if you have some automated method of detecting and executing failover that doesn’t rely on DNS, the current method is okay. Supporting DNS during failovers would enhance network resilience (for example, Yggdrasil’s interface bound connections use DNS to find their public peer). I haven’t implemented these yet, but I have some thoughts:
- Add a bunch of servers to
dnsmasq.conf
, and policy route DNS servers to specific ISPs (e.g., send1.1.1.1
tobrwan1
, and1.0.0.1
tobrwan2
). This doesn’t really “failover”— dnsmasq will just use whatever—but it gets the job done. You can also have dnsmasq bind to specific interfaces as well (though I don’t think this implementsbind-dynamic
behavior, so you may need to test this more thoroughly and/or use statically configured IPs instead of DHCP forbrwan*
). - If you truly want failover, you could use
strict-order
. I’m not the biggest fan of this option. - I would like DNS over HTTPS anyway, maybe there’s a converter that could be made dual-WAN aware. (A dual-WAN aware SOCKS5 proxy would be nice too.)
Manual Failover
We have almost everything we need to execute a failover, at least manually. I’ll explain in the following script; if you used a different configuration edit it accordingly.
#!/bin/bash set -e if [ "$EUID" -ne 0 ]; then echo "This script must be run as root" exit 1 fi if [ "$#" -ne 1 ]; then echo "Usage: $0 <argument>" exit 1 fi # We want our dynamic rule’s priority to be before # the one set in brwan1.network, which is our “default” PRIORITY=50000 case "$1" in isp1) echo "Switching to ISP1..." # remove the rule, if it exists ip rule del priority "$PRIORITY" || true # “migrate” our migratory connections ip rule add priority "$PRIORITY" fwmark 0x10 lookup wanisp1 # set how new connections are to be handled nft -f - << 'EOF' flush chain inet mangle set_connmark_dynamic add rule inet mangle set_connmark_dynamic ct mark set 0x1 EOF ;; isp2) echo "Switching to ISP2..." ip rule del priority "$PRIORITY" || true ip rule add priority "$PRIORITY" fwmark 0x10 lookup wanisp2 nft -f - << 'EOF' flush chain inet mangle set_connmark_dynamic add rule inet mangle set_connmark_dynamic ct mark set 0x2 EOF ;; *) echo "Invalid argument. Please use 'isp1' or 'isp2'." exit 1 ;; esac
Save this as /usr/local/bin/switch_isp.sh
(remember to chmod +x
)
and then you should be able to run sudo switch_isp.sh isp1
and
sudo switch_isp.sh isp2
to automatically migrate existing UDP connections
(such as QUIC or WireGuard) to the new ISP and set new TCP connections to go out
the newly selected ISP, leaving old TCP connections alone.
Conclusions
So I think that was enough for part one. We got inbound connections to work (though admittedly ignored dynamic DNS). We have a setup that dual-WAN aware software can use (like Yggdrasil). We even have manually-initiated failover that doesn’t break existing connections if, for instance, the failover was premature. We also have a lot of flexibility here:
- Want to load balance?
ct mark set jhash ip saddr mod 2 map {0:$CMARK_ISP1,1:$CMARK_ISP2}
will make each computer in your LAN use a different ISP. (jhash is pretty flexible.) - You could load balance UDP/WG connections as well, using connmarks like
0x12
for “prefers ISP2”.
Some further improvements I plan on exploring (no timelines expressed or implied, but this blog supports RSS):
- Automated failover. I have a lot of thoughts on this at different parts of the stack. (“Has this network link failed?” is not an easy question to answer, and may not even be a yes/no question. The better question is something like “Which network link is better to use?”)
- Dynamic DNS! Inbound connections aren’t that useful if you don’t have a way of telling clients where you are!
- Some nice way of alerting. Probably with ntfy.sh at first. (I have many thoughts about how XMPP would be good for alerts, especially for self-hosting setups, but that’s worth an article of its own.)
- IPv6: it’s The Right Thing To Do™.