Ext4 filesystem: data blocks, super blocks, inode structure
Part 2 — Ext4 filesystem: extent, flex_bg, sparse_super
Part 3 — Ext4 file system: Delayed allocation, dirty data blocks
ext4 was proposed to be added to Linux Kernel on 2006. It was added to Linux Kernel in 2008. Since then ext4 has become the default file system of many Linux distributions.
I like to talk about real outputs. Therefore I will mount a disk in my VM (Ubuntu) on my computer (Macbook) and create an ext4 filesystem.
I attached a new disk to the VM:
This output is slightly different on AWS:
My computer advertises the physical sector size as 4096 bytes (4 KB). AWS advertises the physical sector size as 512 bytes (can be confirmed here).
These values can also be read in the following paths:
You can usually see three different sector sizes:
Let’s format the disk with the default parameters of mkfs.ext4:
Let’s examine the output:
Discarding device blocks: done
mkfs.ext4 sends a TRIM command to disk to discard unused blocks. If you don’t want mkfs.ext4 attempt to discard blocks, you can run mkfs.ext4 with the nodiscard parameter.
What are blocks and inodes?
Creating filesystem with 16777216 4k blocks and 4194304 inodes
This is the second line in the output.
What are blocks?
Each disk has a partition or partitions. The file system has to divide the partition(s) into block groups. Block groups are divided into blocks. A block is the maximum contiguous disk space that can be assigned/allocated to a file.
When you create the ext4 file system with the default parameters, the filesystem will be created with a block size of 4 KB (4096 bytes) (see the default values: /etc/mke2fs.conf).
One block will be allocated to a file when you create a file with a length of 1 byte to 4096 bytes.
I mounted the disk to /disk path
I created a 0 byte long file:
The filesystem didn’t allocate a data block to the file because its size is 0.
I added 6 characters to the file:
The size is 6 as expected. The block is allocated to the file.
Why are 8 blocks allocated to the file?
In Linux Kernel, there is a structure called stat
The stat structure has a member called st_blocks
ls -ls, stat, and any other Linux commands that call the stat structure assume 512 bytes of disk space are allocated per block. Because the value of st_blocks is 512 bytes by default (see).
We know that 4096 bytes (1 block) of disk space is actually allocated to our file.
4096 bytes / 512 bytes = 8 blocks.
Let’s run filefrag:
Seems fine as filefrag says 1 block is allocated.
stat is a commonly used command that can provide information about the file. However, it can mislead you.
How do the blocks look?
This is a general structure of block groups. However, ext4 has another layer called Flexible Block Groups (flex_bg). Flexible block groups include multiple block groups for efficiency. We’ll talk about it in Part 2.
What is Block Group?
In a simple term, a block group consists of contiguous blocks combined. The number of blocks per group can be calculated as follows:
8 * block size in bytes
The block size is 4 KB (4096 bytes). 8 * 4096 = 32768 blocks allocated per group.
Let’s make sure it’s correct:
What are inodes?
In a simple term, inode is the identity card of the file. In Linux, every file must have an inode. Because inodes map the blocks to physical sectors. I would like to remind you that the data in the file is actually stored in blocks.
One inode can be used by the multiple files (hard-link)
Inode is actually a data structure that holds some information about the file. e.g.: access, change and creation time, access rights (chmod), uid, gid, number of blocks allocated to the file, number of hard links (not symbolic links, every symbolic has another inode), etc.
An inode doesn’t hold the file name. Because one inode can be used by multiple files due to hard-link. Therefore, if you have created a hard link, the inode can’t be released even if the first file is deleted. Because the inode is still in use. You can see how many hard links there are with ls:
What are special inodes?
Inodes starts from 1. The inode of the root path / is 2. 1 is reserved for bad blocks. Some of the special paths (/proc, /sys, /dev) also have inode 1. Because those paths have pseudo file systems not real file systems (see; procfs, sysfs, devfs). Linux must show an inode for each path. That’s why Linux assumes their inode numbers as 1. You can find the other special inodes here.
When you create a new filesystem, your first file will be allocated in inode 12. Because inode 11 will be allocated to the lost+found directory. If you delete this directory and create a file, the new file will have inode 11.
Linux kernel tends to allocate lower inodes for files. If you delete a file and create another file, the inode of the deleted file is allocated to the new file:
When you create a new directory, its inode will be higher. When you create a new file in the directory, the inode of the file will be higher than the inode of the directory. If you move a file to the directory, the inode of the file is kept:
What is the Super Block?
You may have seen the Super Block in the mkfs.ext4 output. The Super block stores various information about the file system: block size, location of inode tables, size of block groups, and such (see the structure).
Since the file system's metadata is stored in the super block, if the super block gets corrupted, your file system can get corrupted. Therefore, the ext4 file system creates backups of the super block in some block groups. In early versions of ext2 backups used to be created in each block group . However, in ext4 (and later revisions of ext2) file system has a flag called sparse_super.
The superblock will be replicated in only a few block groups if you create the ext4 file system with the sparse_super parameter. Block groups 0, 1, and powers of 3, 5, and 7 will have backup superblocks (e.g.: 343, 243, 125, 81, etc.) (see part 2).
What is Inode Table and Inode Bitmap?
“mkfs.ext4 created inodes. Where are they stored in?” you may ask.
Block groups have an inode table that stores the first inode and the last inode of that block. The inode bitmap has only 0 (zero) and 1 (one). In the inode bitmap, its value is 1 if an inode is used and 0 if an inode is not used.
Same for the block bitmap. Its value is 1 if a block is used, 0 otherwise.
What is the Group Descriptor Table?
Block groups have a special table, called: Group Descriptor Table.
The Group Descriptor Table stores the location of inode table, location of inode bitmap, location of block bitmap, number of free blocks, number of free inodes, and such (see).
What is the Data Block?
There are three types of data blocks; plain data block, directory data block, and symbolic-link data block.
Plain data blocks are the blocks where the actual data in the file or directory is stored. When you write (let’s say) a1b2c2 to a.txt, a1b2c2 will be stored in one of these data blocks.
The directory data blocks are the blocks where the directory entries are stored.
The symbolic-link data blocks are the blocks where the symbolic-link paths are stored. This is why, symbolic links have another inode, unlike hard links. Symbolic links belong to the main file’s path, not the main file’s inode.