Here's a partitioning scheme that we commonly use at Openwall:
With more devices, additional ones are typically mounted under subdirectories of /vz/private
, to create some of the OpenVZ containers on them.
We use the /space
filesystem to reduce the size of and number of writes onto the boot and root filesystem (thereby improving its reliability). Here's an example of what may be moved onto /space
(right after initial installation):
service syslog stop; service postfix stop # if these were started already cd /var mv account log run spool ../space/ ln -s ../space/* . cd /usr mv src ../space/ && ln -s ../space/src # in case you intend to do any builds from source under /usr/src or /usr/src/world
cd / mv home space/ && ln -s space/home # unless a separate filesystem was created for /home
cd / mv vz space/ && ln -s space/vz # unless a separate filesystem was created for /vz
Any additional software installs are typically done into OpenVZ containers, not onto the host system, which is why we're able to keep the root filesystem small (only leaving some room for upgrades to future versions of Owl itself). In fact, the OpenVZ security model assumes that the host system is only used to control the containers, not to run any services not essential to that purpose.
Here's a specific example (system just installed, no OpenVZ containers created yet):
solar@host:~ $ free -m total used free shared buffers cached Mem: 48191 166 48025 0 15 14 -/+ buffers/cache: 136 48055 Swap: 3819 0 3819 solar@host:~ $ df -m Filesystem 1M-blocks Used Available Use% Mounted on /dev/md1 950 435 506 47% / /dev/md2 950 18 923 2% /space /dev/md3 132156 188 131968 1% /vz tmpfs 24096 0 24096 0% /tmp
This is a machine with 48 GB RAM, but only two 146 GB SAS disks. We don't expect much need for swap (given the intended use for this specific machine), but having a little bit of it is useful anyway to maximize the amount of RAM available for disk caching (e.g., some mostly unused instances of mysqld
running in an OpenVZ container may be swapped out). Contrary to popular belief, it is not always a good idea to base the size of swap on RAM size (although this makes sense in absence of other input data).
The reserved-to-root blocks had been reduced to 1% for /
and /space
, and to 0% (none) for /vz
, using the -m
option to tune2fs
:
tune2fs -m1 -c0 -i0 /dev/md1 tune2fs -m1 -c0 -i0 /dev/md2 tune2fs -m0 -c0 -i0 /dev/md3
The -c0 -i0
options disable forced fsck
runs, which would otherwise be made on some reboots. (In our experience, it is better to have them disabled for unattended servers.) It is OK to run these tune2fs
commands (with this specific subset of options) on mounted filesystems.
The related configuration files are:
solar@host:~ $ cat /etc/fstab /dev/md0 swap swap defaults 0 0 /dev/md1 / ext4 noatime 0 1 /dev/md2 /space ext4 nosuid,nodev 0 2 /dev/md3 /vz ext4 nosuid,nodev 0 2 tmpfs /tmp tmpfs nosuid,nodev 0 0 proc /proc proc gid=110 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs noauto 0 0 /dev/cdrom /mnt/cdrom iso9660 noauto,nosuid,owner,ro 0 0 /dev/fd0 /mnt/floppy ext2 noauto,nosuid,owner 0 0
root@host:~ # cat /etc/lilo.conf boot=/dev/md1 root=/dev/md1 raid-extra-boot=mbr read-only lba32 prompt timeout=50 menu-title="Openwall GNU/*/Linux boot menu" menu-scheme=kw:Wb:kw:kw append="md=0,/dev/sda1,/dev/sdb1 md=1,/dev/sda2,/dev/sdb2 md=2,/dev/sda3,/dev/sdb3 md=3,/dev/sda4,/dev/sdb4" image=/boot/vmlinuz-2.6.18-238.9.1.el5.028stab089.1.owl1 label=089.1.owl1
Here's another example (system already in use, 46 OpenVZ containers running, but many more are to be created):
solar@host:~ $ free -m total used free shared buffers cached Mem: 64302 63655 647 0 2117 53949 -/+ buffers/cache: 7588 56714 Swap: 30521 14 30507
root@host:~ # df -m Filesystem 1M-blocks Used Available Use% Mounted on /dev/sda2 3761 850 2720 24% / tmpfs 32152 0 32152 0% /tmp /dev/sda3 9397 295 8625 4% /space /dev/sda4 1834067 184343 1556559 11% /vz/private/raid1
This one has 64 GB RAM. We have a lot more disk space here (several 2 TB disks, two of which have been put to use so far in hardware RAID-1), so we did not try to save on swap as much (even though it's almost unneeded for this machine's usage anyway), nor on the size of /
and /space
(4 GB and 10 GB are generous considering the expected minimal use of the “host system”). Then we have a large filesystem for the containers (and more such filesystems can be created out of additional 2 TB disks, not in use yet). The reserved blocks haven't been changed from the defaults of 5% yet (no need in the foreseeable future; we're likely to bump into the disk seeks speed before we fully use the 2 TB of space on this RAID array).
Finally, here's an old and small machine - 640 MB RAM, one 40 GB IDE disk:
host!solar:~$ free -m total used free shared buffers cached Mem: 626 611 14 0 122 400 -/+ buffers/cache: 88 538 Swap: 954 0 954 host!solar:~$ df -m Filesystem 1M-blocks Used Available Use% Mounted on /dev/hda2 940 415 477 47% / /dev/hda3 1878 43 1817 3% /space /dev/hda4 33812 22392 11420 67% /vz tmpfs 314 0 314 0% /tmp
This one is running just one OpenVZ container (for convenience of administration and a little bit of extra security), although it could as well run several tiny ones.
Overall, the same approach to disk partitioning works well for systems of very different sizes.