Container Networking Under The Hood: Network Namespaces

Networking is the first thing that comes to many people’s minds when we are talking about containers.

Most container technologies use the Network Namespaces feature of the Linux Kernel. The network namespaces provide an isolated network stack in the operating system.

You can create a virtualized network stack with its interface, IP range, routing table, etc. You can run your applications in different network stacks.

E.g., By default, Docker creates a virtual interface (docker0) with the IP range: 172.17.0.0/16

Photo by Jordan Harrison on Unsplash

Virtual Ethernet Device

Let’s assume that you have two computers. You probably know you can create a network link between those two computers via an ethernet cable, and you don’t need a network switch for that.

You can have a similar setup in your Linux server via VETH (Virtual Ethernet Devices) interfaces. You can create a virtual ethernet device that works like an actual ethernet network adapter. The virtual ethernet device will act as a tunnel between the network namespaces that we will create.

We will create two different network namespaces, and then the network namespaces will connect to each other like two computers.

ip netns add pc-one
ip netns add pc-two

We have created two different network namespaces, and it is like we got two different computers.

We will add a virtual ethernet peer between the two namespaces (it is like buying an ethernet cable):

ip link add veth-pc-one type veth peer name veth-pc-two

You can list the network interfaces: ip link list

So, we created (bought a cable) a peer. It isn’t assigned to any network namespace yet (The cable isn’t plugged in). See the image:

We will assign one end of the peer to pc-one and the other end of the peer to pc-two:

ip link set veth-pc-one netns pc-one
ip link set veth-pc-two netns pc-two

The virtual ethernets have been assigned to the namespaces. A veth device can be assigned to only one network namespace. Let’s have a look at the pc-one network namespace:

root@adil:~# ip netns exec pc-one ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
4: veth-pc-one@if3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 8e:ef:69:a1:a3:93 brd ff:ff:ff:ff:ff:ff link-netns pc-two

When you want to run a command inside a network namespace you can execute your command like this: ip netns exec <net ns name> command or ip -n <net ns name> command

The pc-one network namespace has two ethernet devices: lo and veth-pc-one
They are not enabled (UP) yet.

Let’s enable those interfaces:

ip netns exec pc-one ip link set dev veth-pc-one up
ip netns exec pc-two ip link set dev veth-pc-two up

And don’t forget to enable the loopback devices:

ip netns exec pc-one ip link set dev lo up
ip netns exec pc-two ip link set dev lo up

So, we enabled the virtual interfaces.

They don’t have an IP address, though. Let’s assign some IP addresses:

ip -n pc-one addr add 10.0.0.1/24 dev veth-pc-one
ip -n pc-two addr add 10.0.0.2/24 dev veth-pc-two

The setup is like this now:

Let’s test it:

root@adil:~# ip netns exec pc-one ping 10.0.0.1 -c1
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.020 ms
root@adil:~# ip netns exec pc-one ping 10.0.0.2 -c1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.022 ms
root@adil:~# ip netns exec pc-two ping 10.0.0.1 -c1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.037 ms
root@adil:~# ip netns exec pc-two ping 10.0.0.2 -c1
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.018 ms

It works based on the IP routing table:

root@adil:~# ip netns exec pc-one ip route
10.0.0.0/24 dev veth-pc-one proto kernel scope link src 10.0.0.1

We created a virtualized network stack.

This isn’t going to work:

root@adil:~# ping 10.0.0.1 -c1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
--- 10.0.0.1 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

Because I tried to ping 10.0.0.1 from the default network namespace (the main host). There is no routing between the default network namespace and other network namespaces.

Virtual Bridge

We have created a network link between 2 network namespaces. What if you need to connect 3+ network namespaces?

You will need to create a virtual network bridge. We will create three network namespaces and one virtual ethernet device for each network space:

ip netns add one
ip netns add two
ip netns add three
ip link add veth-one-in type veth peer name veth-one-out
ip link add veth-two-in type veth peer name veth-two-out
ip link add veth-three-in type veth peer name veth-three-out

We have something like this:

We will assign the virtual ethernets to the network namespaces, and we will set the IP addresses:

ip link set veth-one-in   netns one
ip link set veth-two-in netns two
ip link set veth-three-in netns three
ip netns exec one ip addr add 10.0.0.10/24 dev veth-one-in
ip netns exec two ip addr add 10.0.0.20/24 dev veth-two-in
ip netns exec three ip addr add 10.0.0.30/24 dev veth-three-in

The setup is like this now:

There are three different virtual network devices. We are going to create a virtual bridge device (it is going to act like a network switch):

ip link add name virtual-bridge type bridge

We will assign other ends of the virtual networks to the virtual bridge network device:

ip link set veth-one-out   master virtual-bridge
ip link set veth-two-out master virtual-bridge
ip link set veth-three-out master virtual-bridge

Let’s enable all kinds of virtual network devices in our stack:

ip link set virtual-bridge upip link set veth-one-out   up
ip link set veth-two-out up
ip link set veth-three-out up
ip netns exec one ip link set dev veth-one-in up
ip netns exec two ip link set dev veth-two-in up
ip netns exec three ip link set dev veth-three-in up
ip netns exec one ip link set dev lo up
ip netns exec two ip link set dev lo up
ip netns exec three ip link set dev lo up

The setup is like this now:

Let’s test it:

root@adil:~# ip netns exec one ping 10.0.0.10 -c1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.017 ms
root@adil:~# ip netns exec one ping 10.0.0.20 -c1
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.052 ms
root@adil:~# ip netns exec one ping 10.0.0.30 -c1
PING 10.0.0.3 (10.0.0.3) 56(84) bytes of data.
64 bytes from 10.0.0.3: icmp_seq=1 ttl=64 time=0.051 ms

For the sake of simplicity, I have written only one network namespace’s ping result.

Let’s try to ping 10.0.0.10 from the default network namespace (main host):

root@adil:~# ping 10.0.0.10 -c1
PING 10.0.0.10 (10.0.0.10) 56(84) bytes of data.
— — 10.0.0.10 ping statistics — -
1 packets transmitted, 0 received, 100% packet loss, time 0ms

It didn’t work. There is no routing between the default network namespace and other network namespaces.

Let’s assign an IP address to the virtual bridge (the route table will also be created):

ip addr add 10.0.0.1/24 dev virtual-bridge

Let’s test it again:

root@adil:~# ping 10.0.0.10 -c1
PING 10.0.0.10 (10.0.0.10) 56(84) bytes of data.
64 bytes from 10.0.0.10: icmp_seq=1 ttl=64 time=0.065 ms

Can the network namespaces connect to Internet?

root@adil:~# ip netns exec three ping 8.8.8.8
ping: connect: Network is unreachable

Since the virtual network stacks don’t have a default gateway, they can’t go to other IP ranges.

Let’s add a default gateway for each network namespace:

ip netns exec one ip route add default via 10.0.0.1
ip netns exec two ip route add default via 10.0.0.1
ip netns exec three ip route add default via 10.0.0.1

It still doesn’t work:

root@adil:~# ip netns exec three ping 8.8.8.8
ping: connect: Network is unreachable

It is because the main host doesn’t forward the IP packages. Besides, the main host must translate the source addresses (Masquerade / SNAT):

iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -j MASQUERADE
sysctl -w net.ipv4.ip_forward=1

The setup is like this now:

Let’s test it:

root@adil:~# ip netns exec three ping 8.8.8.8 -c1
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=111 time=0.818 ms

I’ve created a bash script that creates a virtual network stack:

P.S.:
You can observe the changes in another terminal via
ip link monitor
You can list the network spaces: ip netns