Container Filesystem Under The Hood: OverlayFS

The other post about containerization (Container Networking Under The Hood: Network Namespaces) is here.

The underlying filesystem is one of the mysterious parts of containerization.

The underlying filesystems handle lots of OS images, folders, mount points, etc.,

Photo by Jubal Kenneth Bernal on Unsplash

Before diving into OverlayFS, let’s talk about a simple scenario.

Let’s assume that you have two different disk volumes and you would like to mount them into one folder. Thus, you will be able to list files and folders from those disks in the same folder.

It is possible thanks to Union Mounts [1, 2] in the *nix family for a long time. However, administrating a filesystem with a union mount is tough.

Virtual File Systems provide reliable and relatively easy filesystem management. Aufs, UnionFS, OverlayFS, mhddfs, and MergerFS are popular virtual file systems.

Docker deprecated Aufs in favor of OverlayFS a couple of years ago.

Why does Docker need a virtual file system?

  • Because the containers must be isolated from each other. One container shouldn’t interfere with another container’s filesystem.
  • The OS images are should be downloaded once and should be used multiple times. The OS images don’t need to be duplicated thanks to OverlayFS’ lower/upper directory mechanism.

The directory structure in OverlayFS

There are four directory types in OverlayFS: lower directory, upper directory, work directory, and overlay directory (a.k.a. merge directory)

Lower directory: read-only directory

Upper directory: writable directory

Overlay directory: Combined lower/upper directories (OverlayFS will combine the lower directory and the upper directory)

Work directory: Used by Linux Kernel for the atomic file operations (it needs to be an empty directory)

Let’s get our hands dirty

mkdir lower upper workdir overlaymount -t overlay -o lowerdir=lower,upperdir=upper,workdir=workdir none overlayecho "from lower" >> lower/file1.txt
echo "from upper" >> upper/file2.txt

We created an overlay filesystem between two folders. Let’s have a look at the inside of those directories:

root@adil:~# ls -li overlay/
549319 -rw-r--r—- 1 root root 11 Oct 16 18:45 file1.txt
549320 -rw-r--r—- 1 root root 11 Oct 16 18:45 file2.txt
root@adil:~# ls -li upper/file2.txt
549320 -rw--r--r—- 1 root root 11 Oct 16 18:45 upper/file2.txt
root@adil:~# ls -li lower/file1.txt
549319 -rw-r--r—- 1 root root 11 Oct 16 18:45 lower/file1.txt

I want to draw your attention to the inodes of the files. The files in the overlay directory have a hard link with the corresponding files.

Let’s make it a bit complicated

root@adil:~# echo "from lower" >> lower/file2.txt
root@adil:~# cat overlay/file2.txt
from upper
root@adil:~# cat overlay/file1.txt
from lower

If the lower directory and the upper directory have a file with the same name, then OverlayFS will access the file in the upper directory.

Let’s delete the files from the overlay directory:

root@adil:~# rm -rf overlay/file1.txt overlay/file2.txt
root@adil:~# ls -la overlay/
#empty
root@adil:~# ls -la lower/
-rw-r—-r—- 1 root root 11 Oct 16 18:45 file1.txt
-rw-r—-r—- 1 root root 11 Oct 16 19:03 file2.txt
root@adil:~# ls -la upper/
c —-— —-- -—- 3 root root 0, 0 Oct 16 19:08 file1.txt
c —-— —-- -—- 3 root root 0, 0 Oct 16 19:08 file2.txt
root@adil:~# cat upper/file1.txt
cat: upper/file1.txt: No such device or address
root@adil:~# cat upper/file2.txt
cat: upper/file2.txt: No such device or address
root@adil:~# cat lower/file1.txt
from lower
root@adil:~# cat lower/file2.txt
from lower

The files are kept in the lower directory, and the files seem to be in the upper directory. However, they are not available anymore.

We actually didn’t create `file1.txt` file in the upper directory initially. However, file1.txt was created, and it became unavailable in the upper directory when it was deleted in the overlay directory. It is mind-blowing.

The lower directory is read-only, and the upper directory is writable. Thereby, OverlayFS would try to delete in the upper directory even if it doesn’t exist in the upper directory.

Let’s run the unlink command in the upper directory

root@adil:~# unlink upper/file1.txt
root@adil:~# unlink upper/file2.txt
root@adil:~# ls -la upper/
#empty
root@adil:~# cat overlay/file1.txt
from lower
root@adil:~# cat overlay/file2.txt
from lower

Oops! The files are in the overlay directory. Why? Because if there is no link in the upper directory, then OverlayFS would try to create a hard link from the lower directory.

Let’s create a Docker container

root@adil:~# docker run -dit --name=mycontainer -v /root/test:/root/test -v /run/lock/example:/root/example ubuntu

PS: The filesystem type of /run/lock/example is tmpfs. The filesystem type of /root/test is ext4 on my Ubuntu host.

I have installed Node.js in my container:

root@a8cab50df83c:/# node --version
v16.11.1

I have created a dummy file in /root/test in my container:

root@a8cab50df83c:/# touch /root/test/one.txt

I will run a Node.js script inside the container:

var fs = require('fs');fs.rename('/root/test/one.txt', '/root/example/two.txt', (err) => {
if (err) throw err;
console.log('Succeed');
});

I want to move /root/test/one.txt to /root/example/two.txt through the rename function of Node.js:

root@a8cab50df83c:~# node test.js
/root/test.js:4
if (err) throw err;
^
[Error: EXDEV: cross-device link not permitted, rename '/root/test/one.txt' -> '/root/example/two.txt'] {
errno: -18,
code: 'EXDEV',
syscall: 'rename',
path: '/root/test/one.txt',
dest: '/root/example/two.txt'

This is a known issue. You can’t call rename syscall if the source and the destination folders are not on the top layer. You must copy the file from the source directory to the destination directory. After that, you must remove the link of the file from the source directory:

root@a8cab50df83c:~# cat test.js
var fs = require('fs');
fs.copyFile('/root/test/one.txt', '/root/example/two.txt', (err) => {
if (err) throw err;
console.log('Copy — Success');
fs.unlink('/root/test/one.txt', (err) => {
if (err) throw err;
console.log('Unlink — Success')
})
});

Let’s test it:

root@a8cab50df83c:~# ls -l test/one.txt
-rw-r--r-— 1 root root 0 Oct 16 20:04 test/one.txt
root@a8cab50df83c:~# ls -l example/two.txt
ls: cannot access 'example/two.txt': No such file or directory
root@a8cab50df83c:~# node test.js
Copy — Success
Unlink — Success
root@a8cab50df83c:~# ls -l test/one.txt
ls: cannot access 'test/one.txt': No such file or directory
root@a8cab50df83c:~# ls -l example/two.txt
-rw-r--r-— 1 root root 0 Oct 16 20:04 example/two.txt

Some additional info:

You can use multiple lower directories:

mount -t overlay -o lowerdir=lower1:lower2:lower3,upperdir=upper,workdir=workdir none overlay

The importance of the lower directories is from left to right. If you create a file with an identical name in every lower directory, you will see the file that is from the lower1 directory in the overlay directory.

What does none stand for in the mount command?

We let the mount command know that there is no physical disk partition to the mount point via the none keyword.