Example disk partitioning schemes for Owl

Openwall's

Here's a partitioning scheme that we commonly use at Openwall:

  • /dev/sda1 or /dev/md0 - swap - size varies a lot depending on expected needs and available disk space to waste
  • /dev/sda2 or /dev/md1 - boot and root filesystem - 1 GB or 2 GB (currently, a full install needs about 500 MB)
  • /dev/sda3 or /dev/md2 - /space - size varies a lot depending on expected needs and on whether more partitions are created or not
  • /dev/sda4, /dev/sda5 (if extended partition table is created), or /dev/md3 - /vz - normally uses the rest of the disk

With more devices, additional ones are typically mounted under subdirectories of /vz/private, to create some of the OpenVZ containers on them.

We use the /space filesystem to reduce the size of and number of writes onto the boot and root filesystem (thereby improving its reliability). Here's an example of what may be moved onto /space (right after initial installation):

service syslog stop; service postfix stop # if these were started already
cd /var
mv account log run spool ../space/
ln -s ../space/* .
cd /usr
mv src ../space/ && ln -s ../space/src # in case you intend to do any builds from source under /usr/src or /usr/src/world
cd /
mv home space/ && ln -s space/home # unless a separate filesystem was created for /home
cd /
mv vz space/ && ln -s space/vz # unless a separate filesystem was created for /vz

Any additional software installs are typically done into OpenVZ containers, not onto the host system, which is why we're able to keep the root filesystem small (only leaving some room for upgrades to future versions of Owl itself). In fact, the OpenVZ security model assumes that the host system is only used to control the containers, not to run any services not essential to that purpose.

Example - large RAM, small disk, software RAID

Here's a specific example (system just installed, no OpenVZ containers created yet):

solar@host:~ $ free -m
             total       used       free     shared    buffers     cached
Mem:         48191        166      48025          0         15         14
-/+ buffers/cache:        136      48055
Swap:         3819          0       3819
solar@host:~ $ df -m
Filesystem           1M-blocks      Used Available Use% Mounted on
/dev/md1                   950       435       506  47% /
/dev/md2                   950        18       923   2% /space
/dev/md3                132156       188    131968   1% /vz
tmpfs                    24096         0     24096   0% /tmp

This is a machine with 48 GB RAM, but only two 146 GB SAS disks. We don't expect much need for swap (given the intended use for this specific machine), but having a little bit of it is useful anyway to maximize the amount of RAM available for disk caching (e.g., some mostly unused instances of mysqld running in an OpenVZ container may be swapped out). Contrary to popular belief, it is not always a good idea to base the size of swap on RAM size (although this makes sense in absence of other input data).

The reserved-to-root blocks had been reduced to 1% for / and /space, and to 0% (none) for /vz, using the -m option to tune2fs:

tune2fs -m1 -c0 -i0 /dev/md1
tune2fs -m1 -c0 -i0 /dev/md2
tune2fs -m0 -c0 -i0 /dev/md3

The -c0 -i0 options disable forced fsck runs, which would otherwise be made on some reboots. (In our experience, it is better to have them disabled for unattended servers.) It is OK to run these tune2fs commands (with this specific subset of options) on mounted filesystems.

The related configuration files are:

solar@host:~ $ cat /etc/fstab
/dev/md0        swap                    swap    defaults                0 0
/dev/md1        /                       ext4    noatime                 0 1
/dev/md2        /space                  ext4    nosuid,nodev            0 2
/dev/md3        /vz                     ext4    nosuid,nodev            0 2
tmpfs           /tmp                    tmpfs   nosuid,nodev            0 0
proc            /proc                   proc    gid=110                 0 0
devpts          /dev/pts                devpts  gid=5,mode=620          0 0
sysfs           /sys                    sysfs   noauto                  0 0
/dev/cdrom      /mnt/cdrom              iso9660 noauto,nosuid,owner,ro  0 0
/dev/fd0        /mnt/floppy             ext2    noauto,nosuid,owner     0 0
root@host:~ # cat /etc/lilo.conf
boot=/dev/md1
root=/dev/md1
raid-extra-boot=mbr
read-only
lba32
prompt
timeout=50
menu-title="Openwall GNU/*/Linux boot menu"
menu-scheme=kw:Wb:kw:kw

append="md=0,/dev/sda1,/dev/sdb1 md=1,/dev/sda2,/dev/sdb2 md=2,/dev/sda3,/dev/sdb3 md=3,/dev/sda4,/dev/sdb4"

image=/boot/vmlinuz-2.6.18-238.9.1.el5.028stab089.1.owl1
        label=089.1.owl1

Example - large RAM, large disk, hardware RAID

Here's another example (system already in use, 46 OpenVZ containers running, but many more are to be created):

solar@host:~ $ free -m
             total       used       free     shared    buffers     cached
Mem:         64302      63655        647          0       2117      53949
-/+ buffers/cache:       7588      56714
Swap:        30521         14      30507
root@host:~ # df -m
Filesystem           1M-blocks      Used Available Use% Mounted on
/dev/sda2                 3761       850      2720  24% /
tmpfs                    32152         0     32152   0% /tmp
/dev/sda3                 9397       295      8625   4% /space
/dev/sda4              1834067    184343   1556559  11% /vz/private/raid1

This one has 64 GB RAM. We have a lot more disk space here (several 2 TB disks, two of which have been put to use so far in hardware RAID-1), so we did not try to save on swap as much (even though it's almost unneeded for this machine's usage anyway), nor on the size of / and /space (4 GB and 10 GB are generous considering the expected minimal use of the “host system”). Then we have a large filesystem for the containers (and more such filesystems can be created out of additional 2 TB disks, not in use yet). The reserved blocks haven't been changed from the defaults of 5% yet (no need in the foreseeable future; we're likely to bump into the disk seeks speed before we fully use the 2 TB of space on this RAID array).

Example - small RAM, small disk, no RAID

Finally, here's an old and small machine - 640 MB RAM, one 40 GB IDE disk:

host!solar:~$ free -m
             total       used       free     shared    buffers     cached
Mem:           626        611         14          0        122        400
-/+ buffers/cache:         88        538
Swap:          954          0        954
host!solar:~$ df -m
Filesystem           1M-blocks      Used Available Use% Mounted on
/dev/hda2                  940       415       477  47% /
/dev/hda3                 1878        43      1817   3% /space
/dev/hda4                33812     22392     11420  67% /vz
tmpfs                      314         0       314   0% /tmp

This one is running just one OpenVZ container (for convenience of administration and a little bit of extra security), although it could as well run several tiny ones.

Overall, the same approach to disk partitioning works well for systems of very different sizes.

Back to Openwall GNU/*/Linux user community resources.

Owl/disk-partitions.txt · Last modified: 2011/05/24 14:35 by solar
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate to DokuWiki Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki Powered by OpenVZ Powered by Openwall GNU/*/Linux